- Update README.md to prioritize Primary Agents (Sisyphus, Hephaestus, Prometheus, Atlas, Junior)
- Update overview.md and features.md to distinguish Primary Agents from Specialist Subagents
- Update Librarian and Multimodal-Looker models in docs to match source code fallback chains
- Ensure accuracy of agent descriptions and roles
Renames all environment variable gates from the old oh-my-codex (OMX) prefix
to the correct oh-my-openagent (OMO) prefix:
- OMX_OPENCLAW -> OMO_OPENCLAW
- OMX_OPENCLAW_COMMAND -> OMO_OPENCLAW_COMMAND
- OMX_OPENCLAW_DEBUG -> OMO_OPENCLAW_DEBUG
- OMX_OPENCLAW_COMMAND_TIMEOUT_MS -> OMO_OPENCLAW_COMMAND_TIMEOUT_MS
Adds TDD tests verifying:
- OMO_OPENCLAW=1 is required for activation
- Old OMX_OPENCLAW env var is not accepted
Ports the OMX OpenClaw module into oh-my-openagent as a first-class integration.
This integration allows forwarding internal events (session lifecycle, tool execution) to external gateways (HTTP or command-based).
- Added `src/openclaw` directory with implementation:
- `dispatcher.ts`: Handles HTTP/Command dispatching with interpolation
- `types.ts`: TypeScript definitions
- `client.ts`: Main entry point `wakeOpenClaw`
- `index.ts`: Public API
- Added `src/config/schema/openclaw.ts` for Zod schema validation
- Updated `src/config/schema/oh-my-opencode-config.ts` to include `openclaw` config
- Added `src/hooks/openclaw-sender/index.ts` to listen for events
- Registered the hook in `src/plugin/hooks/create-session-hooks.ts`
- Added unit tests in `src/openclaw/__tests__`
Events handled:
- `session-start` (via `session.created`)
- `session-end` (via `session.deleted`)
- `session-idle` (via `session.idle`)
- `ask-user-question` (via `tool.execute.before` for `ask_user_question`)
- `stop` (via `tool.execute.before` for `stop-continuation` command)
Keep installer, config detection, schema generation, and publish workflows aligned with the long-lived oh-my-opencode package so this release does not split across two npm names.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Reject invalid --opencode-go values during non-TUI installs and detect existing OpenCode Go usage from the generated oh-my-opencode config so updates preserve the right defaults.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Keep install-time and runtime model tables in sync, stop OpenAI-only misrouting when OpenCode Go is present, and add valid OpenAI fallbacks for atlas, metis, and sisyphus-junior.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Allow legitimate repeated slash commands in long sessions by replacing session-lifetime dedup with a short-lived TTL cache.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Keep persisted LINE#ID anchors working after strict whitespace hashing by falling back to the legacy hash for validation-only lookups.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The header line was showing all extensions unioned together which was
redundant with the per-server detail lines below and caused line overflow.
Status mode also simplified to just show server count.
Replace the hardcoded 4-server list in doctor LSP check with getAllServers()
from server-resolution.ts, which covers all 40+ builtin servers plus user
config. Output now shows server count with supported extensions, and verbose
mode expands to per-server detail lines.
Status: LSP 3 servers (.go, .py, .pyi, .ts, .tsx)
Verbose: LSP 3 servers (.go, .py, .pyi, .ts, .tsx)
typescript (.ts, .tsx, .js, .jsx)
pyright (.py, .pyi)
gopls (.go)
Closes#2587
PR #2424 fixed the critical bug (passing client object instead of agent
summaries array), but only included user, project, and raw plugin agents.
This adds the two missing sources:
- OpenCode native config agents (params.config.agent)
- Plugin agents with migrateAgentConfig applied before summary extraction
Ensures Sisyphus has complete awareness of all registered agent sources.
Closes#2386
Co-authored-by: NS Cola <123285105+davincilll@users.noreply.github.com>
Windows CLI tests were not deleting XDG_CONFIG_HOME, making them
fragile in environments where this variable is set. getCliConfigDir()
reads XDG_CONFIG_HOME on all platforms, not just Linux.
Filter out null, undefined, or malformed entries in installed_plugins.json
before accessing properties. Prevents fatal crash on corrupted data.
Addresses cubic-dev-ai review feedback.
mapClaudeModelToOpenCode() returns {providerID, modelID} but opencode
expects model as a string. Both agent loaders now convert to
'providerID/modelID' string format before assigning to config.
Update CLI config manager to detect and handle both legacy (oh-my-opencode)
and new (oh-my-openagent) package names during installation. Migration
will automatically replace old plugin entries with the new name.
🤖 Generated with assistance of OhMyOpenCode
Update build script to generate both oh-my-opencode.schema.json (backward
compatibility) and oh-my-openagent.schema.json (new package name).
Also adds delegate-task-english-directive hook to schema.
🤖 Generated with assistance of OhMyOpenCode
Add centralized plugin identity constants to support migration from
oh-my-opencode to oh-my-openagent. Includes both current and legacy
names for backward compatibility.
🤖 Generated with assistance of OhMyOpenCode
Phase 0 now runs /get-unpublished-changes as single source of truth
instead of manual bash commands. Phase 1 uses its output for grouping.
Layer 2 explicitly references /review-work skill flow.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
disabled_tools was defined in the Zod schema but omitted from
mergeConfigs(), causing project-level config to shadow user-level
disabled_tools instead of merging both sets. Add Set union and
regression test.
Closes#2555
Refine continuation agent resolution to prefer session-state agent fallback while keeping compaction-specific protection. Replace sticky boolean compaction flag with a short-lived timestamp guard so unresolved agents are blocked only during the immediate post-compaction window, avoiding long-lived suppression and preserving existing continuation behavior.
After compaction, message history is truncated and the original agent
(e.g. Prometheus) can no longer be resolved from messages. The todo
continuation enforcer would then inject a continuation prompt with
agent=undefined, causing the host to default to General -- which has
write permissions Prometheus should never have.
Root cause chain:
1. handler.ts had no session.compacted handler (unlike Atlas)
2. idle-event.ts relied on finding a compaction marker in truncated
message history -- the marker disappears after real compaction
3. continuation-injection.ts proceeded when agentName was undefined
because the skipAgents check only matched truthy agent names
4. prometheus-md-only/agent-resolution.ts did not filter compaction
agent from message history fallback results
Fixes:
- Add session.compacted handler that sets hasRecentCompaction state flag
- Replace fragile history-based compaction detection with state flag
- Block continuation injection when agent is unknown post-compaction
- Filter compaction agent in Prometheus agent resolution fallback
Removes in-memory caching so newly created skills mid-session are
immediately available via skill(). Clears the module-level skill cache
before each getAllSkills() call. Pre-provided skills from options are
merged as fallbacks for test compatibility.
OPENCODE_CONFIG_DIR pointing to profiles/ subdirectory caused skills at
~/.config/opencode/skills/ to be invisible. Added getOpenCodeSkillDirs()
with the same parent-dir fallback that getOpenCodeCommandDirs() uses.
Skills now appear as <command> items with / prefix (e.g., /review-work)
instead of <skill> items, making them discoverable alongside regular
slash commands in the skill tool description.
Use namespace import pattern (import * as sender) to prevent cross-file
spy leakage in Bun's shared module state. Move restoreAllMocks to
beforeEach for proper cleanup ordering.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
Add CLI_AGENT_MODEL_REQUIREMENTS entry for sisyphus-junior with
fallback chain: claude-sonnet-4-6 -> kimi-k2.5 -> big-pickle.
🤖 Generated with assistance of OhMyOpenCode
Add mandatory explicit user okay before completing work in Final
Verification Wave. Present consolidated results and wait for user
confirmation before marking tasks complete.
🤖 Generated with assistance of OhMyOpenCode
runBunInstallWithDetails() defaulted to outputMode:"inherit", causing
bun install stdout/stderr to leak into the TUI when callers omitted the
option. Changed default to "pipe" so output is captured silently.
Also fixed stale mock in background-update-check.test.ts: the test was
mocking runBunInstall (unused) instead of runBunInstallWithDetails, and
returning boolean instead of BunInstallResult.
Before provider cache exists (first run), resolveModelForDelegateTask now
returns undefined instead of guessing a model. This lets OpenCode use its
system default model when no model is specified in the prompt body.
User-specified model overrides still take priority regardless of cache state.
Introduce git worktree list --porcelain parsing following upstream opencode patterns. Exports listWorktrees() for full worktree enumeration with branch info alongside existing detectWorktreePath().
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
When provider.list is not available for SDK validation, do not apply the configured ultrawork variant. This prevents models without a max variant from being incorrectly forced to max when ultrawork mode activates.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Prevent the cache test from deleting the user cache directory and add a regression test for that setup path.
Co-authored-by: Codex <noreply@openai.com>
Previously computeLineHash stripped ALL whitespace before hashing, making
indentation changes invisible to hash validation. This weakened the stale-line
detection guarantee, especially for indentation-sensitive files (Python, YAML).
Now only trailing whitespace and carriage returns are stripped, matching
oh-my-pi upstream behavior. Leading indentation is preserved in the hash,
so indentation-only changes correctly trigger hash mismatches.
- Reorder resolveFallbackProviderID: providerHint now checked before global connected-provider cache
- Revert require('bun:test') hack to standard ESM import in fallback-chain-from-models.test.ts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The DEFAULT_CATEGORIES ultrabrain model was updated from openai/gpt-5.3-codex
to openai/gpt-5.4 in a previous commit, but test expectations were not updated.
Updated test expectations in:
- src/plugin-handlers/config-handler.test.ts (lines 560, 620)
- src/agents/utils.test.ts (lines 1119, 1232, 1234, 1301, 1303, 1316, 1318)
background-update-check.ts was using runBunInstall() which defaults to outputMode:"inherit", leaking bun install stdout/stderr into the background session. Reverted to runBunInstallWithDetails({ outputMode: "pipe" }) and explicitly logs result.error on failure.
Restores the accidentally deleted test case asserting that sibling dependencies (e.g. other:"1.0.0") are preserved in package.json after a plugin version sync.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The previous pattern `(-[\w.]+)?` used `\w` which excludes hyphens, causing versions like `1.2.3-alpha-1` and `1.2.3-rc-test` to be misclassified as unpinned tags. Updated both plugin-entry.ts and sync-package-json.ts (which share the definition) to the spec-compliant pattern that allows dot-separated identifiers using [0-9A-Za-z-] and optional build metadata.
Also adds String() coercion before .trim() in sync-package-json.ts to guard against a TypeError if the parsed JSON value for currentVersion is non-string at runtime.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- MCP teardown race: add shutdownGeneration counter to prevent
in-flight connections from resurrecting after disconnectAll
- MCP multi-key disconnect race: replace disconnectedSessions Set
with generation-based Map to track per-session disconnect events
- MCP clients: check shutdownGeneration in stdio/http client
creators before inserting into state.clients
- BackgroundManager: call clearTaskHistoryWhenParentTasksGone after
timer-based task removal in scheduleTaskRemoval and notifyParentSession
- BackgroundManager: clean completedTaskSummaries when parent has
no remaining tasks
- Plugin dispose: remove duplicate tmuxSessionManager.cleanup call
since BackgroundManager.shutdown already handles it via onShutdown
Test expected timers only after allComplete, but H2 fix intentionally
decoupled per-task cleanup from sibling completion state. Updated
assertion to expect timer after individual task notification.
Plugin created managers, hooks, intervals, and process listeners on
every load but had no teardown mechanism. On plugin reload, old
instances remained alive causing cumulative memory leaks.
- Add createPluginDispose() orchestrating shutdown sequence:
backgroundManager.shutdown() → skillMcpManager.disconnectAll() →
disposeHooks()
- Add disposeHooks() aggregator with safe optional chaining
- Wire dispose into index.ts to clean previous instance on reload
- Make dispose idempotent (safe to call multiple times)
Tests: 4 pass, 8 expects
H3: cancelTask(skipNotification=true) now schedules task removal.
Previously the early return path skipped cleanup, leaking task objects
in this.tasks Map permanently. Extracted scheduleTaskRemoval() helper
called from both skipNotification and normal paths.
H2: Per-task completion cleanup timer decoupled from allComplete check.
Previously cleanup timer only ran when ALL sibling tasks completed. Now
each finished task gets its own removal timer regardless of siblings.
H1+C2: TaskHistory.clearAll() added and wired into shutdown(). Added
clearSession() calls on session error/deletion and prune cycles.
taskHistory was the only data structure missed by shutdown().
Tests: 10 pass (3 cancel + 3 completion + 4 history)
H7: Process 'exit'/'SIGINT' listeners registered per-session were
never removed when all sessions disconnected, accumulating handlers.
- Add unregisterProcessCleanup() called in disconnectAll()
H8: Race condition where disconnectSession() during pending connection
left orphan clients in state.clients.
- Add disconnectedSessions Set to track mid-flight disconnects
- Check disconnect marker after connection resolves, close if stale
- Clear marker on reconnection for same session
Tests: 6 pass (3 disconnect + 3 race)
When queryWindowState returned null during session deletion, the
session mapping was deleted but the real tmux pane stayed alive,
creating zombie panes.
- Add closePending/closeRetryCount fields to TrackedSession
- Mark sessions closePending instead of deleting on close failure
- Add retryPendingCloses() called from onSessionCreated and cleanup
- Force-remove mappings after 3 failed retry attempts
- Extract TrackedSessionState helper for field initialization
Tests: 3 pass, 9 expects
Prune interval created inside hook was not exposed for disposal,
preventing cleanup on plugin unload.
- Add dispose() method that clears the prune interval
- Export dispose in hook return type
Tests: 2 pass, 6 expects
setInterval for model availability monitoring was never cleared,
keeping the hook alive indefinitely with no dispose mechanism.
- Add dispose() method to RuntimeFallbackHook that clears interval
- Track intervalId in hook state for cleanup
- Export dispose in hook return type
Tests: 3 pass, 10 expects
Sync call_omo_agent leaked entries in global activeSessionMessages
and activeSessionToolResults Sets when execution threw errors,
since cleanup only ran on success path.
- Wrap session Set operations in try/finally blocks
- Ensure Set.delete() runs regardless of success/failure
- Add guard against double-cleanup
Tests: 2 pass, 14 expects
processedCommands and recentResults Sets grew infinitely because
Date.now() in dedup keys made deduplication impossible and no
session.deleted cleanup existed.
- Extract ProcessedCommandStore with maxSize cap and TTL-based eviction
- Add session cleanup on session.deleted event
- Remove Date.now() from dedup keys for effective deduplication
- Add dispose() for interval cleanup
Tests: 3 pass, 9 expects
- Replace overly broad .includes('anthropic') with exact provider ID
matching against known Anthropic providers (anthropic, google-vertex-
anthropic, aws-bedrock-anthropic) in context-limit-resolver
- Add afterEach cleanup for vision-capable-models cache in look-at
tool tests to prevent cross-test state leakage
When users switch from pinned version to tag in opencode.json (e.g.,
3.10.0 -> @latest), the cache package.json still contains the resolved
version. This causes bun install to reinstall the old version instead
of resolving the new tag.
This adds syncCachePackageJsonToIntent() which updates the cache
package.json to match user intent before running bun install. Uses
atomic writes (temp file + rename) with UUID-based temp names for
concurrent safety.
Critical changes:
- Treat all sync errors as abort conditions (file_not_found,
plugin_not_in_deps, parse_error, write_error) to prevent corrupting
a bad cache state further
- Remove dead code (unreachable revert branch for pinned versions)
- Add tests for all error paths and atomic write cleanup
- New context-limit-resolver.ts with resolveActualContextLimit() shared helper
- Anthropic provider detection now uses .includes('anthropic') instead of hard-coded IDs
- Both context-window-monitor and dynamic-truncator use the shared resolver
- Added missing test cases: Anthropic+1M disabled+cached limit, non-Anthropic without cache
- Check response.error and !response.data after session.get() to fail closed
- Prevents unlimited spawning when SDK returns non-throwing error responses
- Added regression tests for SDK error and missing data scenarios
- Import and inject buildAntiDuplicationSection() in all 3 Prometheus variants (interview-mode, gpt, gemini) and Metis
- Added tests verifying anti-dup section presence in all prompt variants
- Completes anti-duplication coverage for all delegating agents
- parseInteger() now rejects malformed input like '120oops' using /^\d+$/ regex
- New parseActiveValue() validates active flag is exactly '0' or '1'
- Added regression tests for malformed integers, negative values, empty fields, non-binary active flags
Resume GPT sessions when the last assistant reply ends in a permission-seeking tail, while honoring stop-continuation and avoiding duplicate continuation across todo and atlas flows.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Extract types to types.ts
- Extract constants to constants.ts
- Extract session ID helpers to session-id.ts
- Extract recovery logic to recovery.ts
hook.ts reduced from 331 to 164 LOC
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add -m, --model <provider/model> option to oh-my-opencode run command.
Allows users to override the model while keeping the agent unchanged.
Changes:
- Add model?: string to RunOptions interface
- Create model-resolver.ts to parse provider/model format
- Add model-resolver.test.ts with 7 test cases (TDD)
- Add --model CLI option with help text examples
- Wire resolveRunModel in runner.ts and pass to promptAsync
- Export resolveRunModel from barrel (index.ts)
Example usage:
bunx oh-my-opencode run --model anthropic/claude-sonnet-4 "Fix the bug"
bunx oh-my-opencode run --agent Sisyphus --model openai/gpt-5.4 "Task"
- Mock getOmoOpenCodeCacheDir to use temp directories
- Clear real cache files in beforeEach to prevent pollution
- Add top-level beforeEach/afterEach in model-availability.test.ts
- Use mock.module for proper test isolation
- Fixes model-error-classifier, model-availability, connected-providers-cache
- Add try/finally for fake timers cleanup
- Restore real timers in beforeEach/afterEach
- Use enforceMainSessionFilter: false for grace period tests
- Prevent timer state pollution between tests
Implements question-denied session permission rules when creating child
sessions via background task delegation. This prevents subagents from
asking questions by passing explicit permission configuration during
session creation.
🤖 GENERATED WITH ASSISTANCE OF OhMyOpenCode
Retry compaction recovery when model or tool state is still incomplete, and treat reasoning or tool-only assistant progress as valid output so no-text tail recovery does not misfire.
Replace single-package npm/dt badge with shields.io endpoint badge
that combines downloads from both oh-my-opencode and oh-my-openagent.
Endpoint: https://ohmyopenagent.com/api/npm-downloads
The previous keyTrigger ('Work plan created → invoke Momus') was too
vague — Sisyphus would fire Momus on inline plans or todo lists,
causing Momus to REJECT because its input_extraction requires exactly
one .sisyphus/plans/*.md file path.
The updated trigger explicitly states:
- Momus should only be invoked when a plan file exists on disk
- The file path must be the sole prompt content
- Inline plans and todo lists should NOT trigger Momus
When users switch opencode.json from pinned version to tag (e.g., 3.10.0 -> @latest),
the cache package.json still contains the pinned version. This causes bun install
to reinstall the old version instead of resolving the new tag.
This adds syncCachePackageJsonToIntent() which updates the cache package.json
to match the user's declared intent in opencode.json before running bun install.
Also fixes mock.module in test files to include all exported constants,
preventing module pollution across parallel tests.
The GitHub repository was renamed from oh-my-opencode to oh-my-openagent,
but all documentation, scripts, and source code references still pointed
to the old repository name. This caused confusion for users who saw
'oh-my-opencode' in docs but a different repo name on GitHub.
Updated all references across:
- README files (en, ko, ja, zh-cn, ru)
- CONTRIBUTING.md
- docs/ (installation, overview, configuration, etc.)
- Source code (schema URLs, GitHub API calls, issue links)
- Test snapshots
The npm package name remains 'oh-my-opencode' (unchanged).
Fixes: https://x.com/Dhruv14588676/status/2031216617762468348
- Hoist resolveFallbackChainForCallOmoAgent before sync/background branch
so sync executor also receives the fallback chain
- Add fallbackChain parameter to sync-executor with setSessionFallbackChain
- Mock connected-providers-cache in event.model-fallback tests for
deterministic behavior (no dependency on local cache files)
- Update test expectations to account for no-op fallback skip when
normalized current model matches first fallback entry
- Add cache spy isolation for subagent-resolver fallback_models tests
The prompt update from sisyphus.ts was not applied to the gpt-5-4 and
default variant files. This aligns all three sisyphus prompt variants
to use the updated background result handling guidance.
Tighten Background Result Collection instructions so the model waits
for explore/librarian results instead of duplicating their work and
delivering premature answers.
- Remove 'Continue working immediately' which models interpreted as
'do ALL work yourself, ignore agents'
- Clarify step 2: only do DIFFERENT independent work while waiting
- Add explicit step 3: end response when no other work remains
- Add 'not for files you already know' to explore section header
Fixes#2124, fixes#1967
- Dedupe directory-checking logic: findWorkspaceRoot now uses isDirectoryPath helper
- Add early return when no files match extension to avoid starting LSP server unnecessarily
After publishing oh-my-opencode-{platform}, rename package.json and
publish oh-my-openagent-{platform} from the same build artifact.
Download/extract steps now run if either package needs publishing.
- Support binary bun.lockb format by deleting entire file (cannot parse)
- Add workspace check: verify package.json exists before bun install
- Quote paths in error messages for Windows/paths with spaces
Add deprecation notice explaining that getHighVariant() is no longer
used by the hook. Function is kept for backward compatibility and
potential future validation use.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Remove '고민' from MULTILINGUAL_KEYWORDS as it triggers false
positives in everyday Korean speech. The keyword '생각' (thinking)
remains for intentional think-mode activation.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The hook was incorrectly setting output.message.model.modelID to non-existent
variants like gpt-5-nano-high, causing ProviderModelNotFoundError.
OpenCode's native variant system only needs the variant field - it handles
the transformation automatically.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
When a completed session is no longer returned by session.status(),
allStatuses[sessionID] is undefined. Previously this fell through to
a 'still running' log, leaving the task stuck as running forever.
Match the sync-session-poller pattern: only continue (skip completion
check) when sessionStatus EXISTS and is not idle. When undefined,
fall through to validateSessionHasOutput + checkSessionTodos +
tryCompleteTask, same as idle.
Mock constants directly in cache.test to avoid transitive module-cache reuse when importing cache.ts. This removes Date.now query cache-busting and makes the test deterministic across runs.
Align version lookup, invalidation, and bun install with OpenCode's cache directory so updates target the loaded plugin location. Keep dependency declarations intact during invalidation so auto-update can reinstall instead of converging to uninstall.
Load the doctor loaded-version module through a unique test-only specifier so Bun module mocks from system tests cannot leak into the real module assertions in CI.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Add isDirectoryPath helper to lsp-client-wrapper.ts
- Create directory-diagnostics.ts with aggregateDiagnosticsForDirectory
- Update diagnostics-tool.ts with extension parameter for directory paths
- Update Atlas agent prompts to use extension param for directory diagnostics
- Add unit tests for isDirectoryPath and aggregateDiagnosticsForDirectory
Fixes#2362
The publish-main job relied on npm trusted publishing (OIDC) which
broke after the repo rename from oh-my-opencode to oh-my-openagent.
Adding explicit NODE_AUTH_TOKEN restores auth while --provenance
still uses OIDC for Sigstore attestation.
Fixes#2373
npm --provenance validates repository.url against the actual GitHub
repo. Since the repo was renamed to oh-my-openagent, all platform
binary publishes failed with E422 provenance mismatch.
Change multimodal-looker's primary model from gpt-5.3-codex to gpt-5.4 medium
in both runtime and CLI fallback chains.
Changes:
- Runtime chain (src/shared/model-requirements.ts): primary now gpt-5.4
- CLI chain (src/cli/model-fallback-requirements.ts): primary now gpt-5.4
- Updated test expectations in model-requirements.test.ts
- Updated config-manager.test.ts assertion
- Updated model-fallback snapshots
Sisyphus can now fall back through Kimi and OpenAI models when Claude
is unavailable, enabling OpenAI-only users to use Sisyphus directly
instead of being redirected to Hephaestus.
Runtime chain: claude-opus-4-6 max → k2p5 → kimi-k2.5 → gpt-5.4 medium → glm-5 → big-pickle
CLI chain: claude-opus-4-6 max → k2p5 → gpt-5.4 medium → glm-5
- Update DEFAULT_CATEGORIES to use 'openai/gpt-5.4-high' directly instead of separate model + variant
- Add helper functions (isExplicitHighModel, getExplicitHighBaseModel) to preserve explicit high models during fuzzy matching
- Update category resolver to avoid collapsing explicit high models to base model + variant pair
- Update tests to verify explicit high model handling in both background and sync modes
- Update documentation examples to reflect new configuration
🤖 Generated with OhMyOpenCode assistance
Sisyphus-Junior was missing from BuiltinAgentName type, agentSources map,
barrel exports, and AGENT_MODEL_REQUIREMENTS. This caused type inconsistencies
and prevented model-fallback hooks from working for sisyphus-junior sessions.
Closescode-yeongyu/oh-my-openagent#1697
Align runtime defaults, tests, docs, and generated artifacts with the newer GPT-5.4 baseline. Keep think-mode and prompt-routing expectations consistent after the model version bump.
Restructure from 13 scattered XML blocks to 8 dense blocks with 9
named sub-anchors, following OpenAI GPT-5.4 prompting guidance and
Oracle-reviewed context preservation strategy.
Key changes:
- Merge think_first + intent_gate + autonomy into unified <intent>
with domain_guess classification and <ask_gate> sub-anchor
- Add <execution_loop> as central workflow: EXPLORE -> PLAN -> ROUTE ->
EXECUTE_OR_SUPERVISE -> VERIFY -> RETRY -> DONE
- Add mandatory manual QA in <verification_loop> (conditional on
runnable behavior)
- Move <constraints> to position #2 for GPT-5.4 attention pattern
- Add <completeness_contract> as explicit loop exit gate
- Add <output_contract> and <verbosity_controls> per GPT-5.4 guidance
- Add domain_guess (provisional) in intent, finalized in ROUTE after
exploration -- visual domain always routes to visual-engineering
- Preserve all named sub-anchors: ask_gate, tool_persistence,
parallel_tools, tool_method, dependency_checks, verification_loop,
failure_recovery, completeness_contract
- Add skill loading emphasis at intent/route/delegation layers
- Rename EXECUTE to EXECUTE_OR_SUPERVISE to preserve orchestrator
identity with non-execution exits (answer/ask/challenge)
Add MANUAL_QA_MANDATE sections to all three ultrawork prompts (default,
GPT, Gemini). Agents must now define acceptance criteria in TODO/Task items
before implementation, then execute manual QA themselves after completing
work. lsp_diagnostics alone is explicitly called out as insufficient since
it only catches type errors, not functional bugs.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Remove deep parallel delegation section from GPT-5.4 Sisyphus prompt since
it encouraged direct implementation over orchestration. Add zero-tolerance
category domain matching guide to all Sisyphus prompts with visual-engineering
examples. Rewrite visual-engineering category prompt with 4-phase mandatory
workflow (analyze design system, create if missing, build with system, verify)
targeting Gemini's tendency to skip foundational steps.
Add <parallel_tool_calling> and <tool_usage_rules> blocks that GPT-5.4
treats as first-class behavioral contracts. Add parallel-planning question
to <think_first>, strengthen Exploratory route in intent gate, and add
IN PARALLEL annotations to verification loop.
Remove the hardcoded external_directory: "allow" default from
applyToolConfig(). This was silently overriding OpenCode's built-in
default of "ask" and any user-configured external_directory permission.
With this change, external_directory permission is fully controlled by
OpenCode's defaults and user configuration, as intended.
Fixes#1973Fixes#2194
Replace real setTimeout(7000) with fake timer interception in atlas
retry tests (35s -> 227ms). Add missing opencode provider to glm-5
fallback in unspecified-high category.
When atlas injects a boulder continuation via promptAsync() and the
model's response is immediately aborted (MessageAbortedError), OpenCode
fires a burst of session.idle events within milliseconds. Atlas blocks
all of them due to the 5-second cooldown. After the burst, OpenCode
stops generating session.idle events (it's state-change based, not
periodic), leaving the session stuck forever.
Fix: When cooldown blocks an idle event for a boulder session with an
incomplete plan, schedule a one-shot setTimeout (cooldown + 1s) to
re-attempt injection. The timer callback re-checks boulder state, plan
progress, and continuation-stopped flag before injecting. Only one timer
per session is allowed (deduped via pendingRetryTimer field). Timers are
cleaned up on session.deleted and session.compacted events.
Split monolithic hephaestus.ts into directory with model-specific prompt
variants (gpt-5-4.ts, gpt-5-3-codex.ts, gpt.ts) mirroring the
sisyphus-junior pattern. Generic gpt.ts uses pre-codex-tuning prompt as
fallback for non-specific GPT models.
Also adds isGpt5_4Model and isGpt5_3CodexModel helpers to types.ts.
When no worktree is specified in boulder, stop injecting 'Worktree Setup Required'
instructions. When worktree IS present, inject emphatic instructions ensuring the
agent and all subagents operate exclusively within the worktree directory.
WAIT_FOR_SESSION_TIMEOUT_MS of 2ms was too tight for 2 poll iterations
at 1ms intervals — setTimeout precision caused the budget to expire
before the 2nd getTask call. Bumped to 50ms.
When git-master skill is loaded, all git commands are prefixed with the configured env variable (default: GIT_MASTER=1). This enables custom git hooks to detect git-master skill usage. Set to empty string to disable.
The installer was writing Antigravity provider config and calling a no-op addAuthPlugins function. Since opencode-antigravity-auth is no longer auto-installed and OpenCode supports native Google/Gemini auth, all Antigravity-related installer code is dead. Gemini detection now checks for native google provider instead.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
When users have .md agent files in ~/.claude/agents/ with the same names
as builtin agents (e.g. sisyphus.md, hephaestus.md, atlas.md),
loadAgentsFromDir() hardcodes mode: "subagent" for all loaded agents.
Because the config assembly spreads userAgents after builtinAgents:
config.agent = {
...builtinAgents, // sisyphus: mode="primary"
...userAgents, // sisyphus: mode="subagent" ← overrides
}
this causes all primary agents to become subagents. The TUI filters out
subagents, so only non-plugin agents (like "docs") appear in the agent
selector.
Fix:
- Filter out user/project agents that share names with builtin agents
before spreading into config.agent (both sisyphus-enabled and fallback
branches)
- Respect frontmatter `mode` field in .md agent files instead of
hardcoding "subagent"
- Add `mode` to AgentFrontmatter type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix#2289
invalidatePackage() only removed the plugin from USER_CONFIG_DIR/node_modules/,
but bun may install it in CACHE_DIR/node_modules/ on some systems. This left a
stale copy behind, causing the startup toast to keep showing the old version even
after the auto-update completed successfully.
Now both candidate locations are checked and removed so the reinstalled version
is loaded on the next restart.
Strip \r characters from list-panes output to handle Windows-style line
endings. Also relax field count check from 9 to 8 to handle cases where
pane_title is empty or missing, which caused the parser to drop pane
rows and fail to determine the main pane in single-pane sessions.
Fixes#2241
kimi-k2.5-free is no longer available. Remove from all agent and category
fallback chains (sisyphus, multimodal-looker, prometheus, metis, atlas,
writing). Reorder multimodal-looker to: gpt-5.3-codex medium -> k2p5 ->
gemini-3-flash -> glm-4.6v -> gpt-5-nano.
GPT-5.3 Codex has strong multimodal capabilities. Promote it to first
candidate in multimodal-looker fallback chain, with gemini-3-flash
following (matching the ULW pattern of gpt-5.3-codex -> gemini).
The server health check was using /health which returns HTTP 403 since
the endpoint doesn't exist in OpenCode. The correct endpoint is
/global/health as defined in OpenCode's server routes.
Fixes#2260
Remove ohmyopencode.com impersonation warnings from all localized READMEs
and fix markdown table column alignment across ja, ko, ru, zh-cn variants.
Also remove Sisyphus Labs note block from ko README.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
The librarian agent's system prompt contained incorrect example function
names for the Exa web search tool, causing the agent to call a non-existent
tool 'websearch_exa_web_search_exa' instead of the correct
'websearch_web_search_exa'.
Fixes#2242
compactedSessions permanently blocked re-compaction after first success,
causing unbounded context growth (e.g. 500k on Kimi K2.5 with 256k limit).
- Clear compactedSessions flag on new message.updated so compaction can
re-trigger when context exceeds threshold again
- Use modelContextLimitsCache for model-specific context limits instead
of always falling back to 200k for non-Anthropic providers
Hashline edit tool and companion hooks now require explicit opt-in
via `"hashline_edit": true` in config. Previously enabled by default.
- tool-registry: hashline edit tool not registered unless opted in
- create-tool-guard-hooks: hashline-read-enhancer disabled by default
- Updated config schema comment and documentation
- Added TDD tests for default behavior
Previously decoded entire image buffer to read headers. Now slices base64
to 32KB prefix before decoding — sufficient for PNG/GIF/WebP/JPEG headers.
Dramatically reduces memory allocation for large images.
applyToolConfig() unconditionally set question permission based only on
OPENCODE_CLI_RUN_MODE, ignoring the question:deny already configured via
OPENCODE_CONFIG_CONTENT. This caused agents to hang in headless environments
(e.g. Maestro Auto Run) where the host sets question:deny but does not
know about the plugin-internal OPENCODE_CLI_RUN_MODE variable.
Read permission.question from OPENCODE_CONFIG_CONTENT and give it highest
priority: config deny > CLI run mode deny > default allow.
Ripgrep's --glob flag silently returns zero results when the search target
is an absolute path and the pattern contains directory prefixes (e.g.
'apps/backend/**/*.ts' with '/project'). This is a known ripgrep behavior
where glob matching fails against paths rooted at absolute arguments.
Fix by running ripgrep with cwd set to the search path and '.' as the
search target, matching how the find backend already operates. Ripgrep
then sees relative paths internally, so directory-prefixed globs match
correctly. Output paths are resolved back to absolute via resolve().
Intercepts Read tool output with image attachments and resizes to comply with Anthropic API limits (≤1568px long edge, ≤5MB). Only activates for Anthropic provider sessions and appends resize metadata (original/new resolution, token count) to tool output.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Non-Claude models skip planning and under-parallelize. Two new sections
injected only when model is not Claude:
- Plan Agent Dependency: multi-step tasks MUST consult Plan Agent first,
use session_id for follow-ups, ask aggressively when ambiguous
- Deep Parallel Delegation (rewrite): explicit '4 units = 4 agents'
pattern, each with clear GOAL + success criteria, all run_in_background
Sisyphus prompt instructed 'your next action is background_output' which
caused agents to repeatedly poll running tasks instead of ending their
response and waiting for the system notification.
- Replace 'STOP all other output' with 'end your response' (actionable)
- Add system-reminder notification mechanism explanation
- Add explicit 'Do NOT poll' prohibition
- Reduce background_cancel(all=true) mentions from 5x to 1x (Hard Blocks)
- Reduce Oracle collect obligation from 4x to 2x
- Remove motivational fluff ('blind spots', 'normal and expected')
Net: -2 lines, clearer mechanism, eliminates polling loop root cause.
- Replace two-pass env interpolation with single-pass combined regex to
prevent re-interpolation of $-sequences in substituted header values
- Convert HookEntry to discriminated union so type: "http" requires url,
preventing invalid configs from passing type checking
- Add regression test for double-interpolation edge case
Add type: "http" hook support matching Claude Code's HTTP hook specification.
HTTP hooks send POST requests with JSON body, support env var interpolation
in headers via allowedEnvVars, and configurable timeout.
New files:
- execute-http-hook.ts: HTTP hook execution with env var interpolation
- dispatch-hook.ts: Unified dispatcher for command and HTTP hooks
- execute-http-hook.test.ts: 14 tests covering all HTTP hook scenarios
Modified files:
- types.ts: Added HookHttp interface, HookAction union type
- config.ts: Updated to accept HookAction in raw hook matchers
- pre-tool-use/post-tool-use/stop/user-prompt-submit/pre-compact:
Updated all 5 executors to dispatch HTTP hooks via dispatchHook()
- plugin-loader/types.ts: Added "http" to HookEntry type union
When models pass relative paths (e.g. 'apps/ios/CleanSlate') to glob/grep
tools, they were passed directly to ripgrep which resolved them against
process.cwd(). In OpenCode Desktop, process.cwd() is '/' causing all
relative path lookups to fail with 'No such file or directory'.
Fix: use path.resolve(ctx.directory, args.path) to resolve relative paths
against the project directory instead of relying on process.cwd().
Add start_work.auto_commit configuration option to allow users to
disable the automatic commit step in the /start-work workflow.
When auto_commit is false:
- STEP 8: COMMIT ATOMIC UNIT is removed from orchestrator reminder
- STEP 9: PROCEED TO NEXT TASK becomes STEP 8
Resolves#2197
GPT-5.x Codex models occasionally hallucinate malformed tool calls like:
assistant to=functions.XXX <garbled_unicode>json\n{...}
This is a model-level issue where the model outputs tool calls as text
instead of using the native tool calling mechanism. Add explicit prompt
instructions telling the model to use the tool call interface.
Related: #2190
The auto-update checker hook calls getConfigDir() which requires the
config context to be initialized via initConfigContext(). When running
in the OpenCode TUI plugin runtime (vs CLI), this initialization never
happened, causing the warning:
"getConfigContext() called before initConfigContext(); defaulting to CLI paths."
This warning would appear when opening folders containing .mulch or .beads
directories because the lifecycle plugins triggered the auto-update checker.
Fix: Call initConfigContext("opencode", null) at plugin startup to ensure
the config context is properly initialized for all hooks and utilities.
Fixes upstream issue where TUI users see spurious bun install warnings.
getConfigContext() emitted a console.warn when called before
initConfigContext() completed. Since initConfigContext runs async
(spawns opencode --version subprocess), other modules calling
getConfigDir/getConfigJson could trigger this warning during startup.
The fallback behavior is intentional and safe (defaults to standard
CLI paths), but console.warn writes to stderr which the TUI captures,
causing the warning to render inside the user's textbox.
Fixes#2183
The hint '(MOST COMMON for single-line edits)' misleads agents into
thinking pos-only replace is the default behavior. When agents want
to replace multiple lines but only specify pos without end, the tool
only replaces one line, causing duplicate code from retained lines.
Allow users to set `background_task.syncPollTimeoutMs` in config to override
the default 10-minute sync subagent timeout. Affects sync task, sync continuation,
and unstable agent task paths. Minimum value: 60000ms (1 minute).
Co-authored-by: Wine Fox <fox@ling.plus>
Prefer the in-memory session agent set by /start-work when validating idle continuation eligibility, so stale message storage agent values do not block boulder continuation.
Bun's internal download of baseline compile targets from npm registry
consistently fails on Windows CI runners (ExtractionFailed error).
Pre-download the baseline binary via curl into Bun's cache directory
so the compile step finds it already cached and skips the download.
Also makes publish job resilient with if: always() so one failed
platform doesn't block publishing all other successful platforms.
publish job now runs with if: always() && !cancelled(), and gates
each publish step on download.outcome == 'success'. One flaky target
(e.g. windows-x64-baseline) no longer blocks all other platforms.
The auto-commit section instructed Hephaestus to automatically commit after
implementation work. Users who didn't know about this behavior would get
surprise commits — a trust-breaking behavioral change flagged by 5 Oracle
reviews as the sole publish blocker for 3.9.0.
- headless.ts: emit error field on tool_result when output starts with Error:
- test-multi-model.ts: errored/timed-out models now shown as RED and exit(1)
- test-multi-model.ts: validate --timeout arg (reject NaN/negative)
- test-edge-cases.ts: use exact match instead of trim() for whitespace test
- test-edge-cases.ts: skip file pre-creation for create-via-append test
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Standalone headless agent using Vercel AI SDK v6 with FriendliAI provider.
Imports hashline-edit pure functions directly from src/ for benchmarking
the edit tool against LLMs (Minimax M2.5 via FriendliAI).
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Canonicalize anchors in dedupe keys to handle whitespace variants
- Make lines field required in edit operations
- Only allow unanchored append/prepend to create missing files
- Reorder delete/rename validation to prevent edge cases
- Add allow_non_gpt_model and max_prompt_tokens to config schema
```
date is already injected by OpenCode's system.ts. omo-env now contains only
Timezone and Locale, which are stable across requests and never break cache.
Current time with HH:MM:SS changed every second, invalidating the prompt cache
on every request. Date-level precision is sufficient; timezone and locale are
stable. Removes Current time field entirely from createEnvContext output.
Drop provider-specific thinking config injection (THINKING_CONFIGS, getThinkingConfig,
resolveProvider) and instead rely on the provider to handle thinking based on the variant field.
Hook now fires on chat.message using model from input rather than from the message object.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Delegate thinking config control to the provider layer rather than
injecting it manually in ultrawork model override.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
OpenCode's copilot fetch wrapper already sets x-initiator based on the
actual HTTP request body content. When oh-my-opencode's chat.headers
hook overrides it with 'agent', the Copilot API detects a mismatch
between the header and the request body and rejects the request with
'invalid initiator'.
This matches the approach OpenCode's own chat.headers handler uses
(copilot.ts:314) — it explicitly skips @ai-sdk/github-copilot models
because the fetch wrapper handles x-initiator correctly on its own.
The getNextFallback function returned raw model names from the
hardcoded fallback chain without transforming them for the target
provider. For example, github-copilot requires dot notation
(claude-sonnet-4.6) but the fallback chain stores hyphen notation
(claude-sonnet-4-6).
The background-agent retry handler already calls
transformModelForProvider correctly, but the sync chat.message
hook in model-fallback was missing it — a copy-paste omission.
Add transformModelForProvider call in getNextFallback and a test
verifying github-copilot model name transformation.
When a background task completes and the parent session is waiting for
user input, promptAsync() fails with an aborted error. Previously the
notification was silently dropped — lost forever.
Fix: queue the notification text in-memory on the BackgroundManager
when promptAsync fails with an aborted/idle error. On the user's next
message to that session, the queued notifications are injected into the
chat context before the agent sees the message.
- BackgroundManager: add pendingNotifications map + queuePendingNotification()
and injectPendingNotificationsIntoChatMessage() methods
- background-notification hook: add chat.message handler that calls injection
- chat-message.ts: wire backgroundNotificationHook.chat.message into the
message processing chain
- Add tests covering queue-on-abort and next-message delivery
- Rename test to 'should inject when last agent is sisyphus and boulder targets atlas
explicitly' and flip expectation to toHaveBeenCalled() - the old assertion was
testing the buggy deadlock behavior
- Add 'should not inject when last agent is non-sisyphus and does not match boulder
agent' to verify hephaestus (unrelated agents) are still correctly skipped
The test 'hephaestus is created when github-copilot provider is connected'
had incorrect expectation. github-copilot does not provide gpt-5.3-codex,
so hephaestus should NOT be created when only github-copilot is connected.
This test was causing CI flakiness due to incorrect assertion and
missing readConnectedProvidersCache mock (state pollution between tests).
Also adds cacheSpy mock for proper isolation.
The test 'hephaestus is created when github-copilot provider is connected'
had incorrect expectation. github-copilot does not provide gpt-5.3-codex,
so hephaestus should NOT be created when only github-copilot is connected.
This test was causing CI flakiness due to incorrect assertion and
missing readConnectedProvidersCache mock (state pollution between tests).
Also adds cacheSpy mock for proper isolation.
The test 'hephaestus is created when github-copilot provider is connected'
had incorrect expectation. github-copilot does not provide gpt-5.3-codex,
so hephaestus should NOT be created when only github-copilot is connected.
This test was causing CI flakiness due to incorrect assertion and
missing readConnectedProvidersCache mock (state pollution between tests).
Also adds cacheSpy mock for proper isolation.
The boulder continuation in event-handler.ts skipped injection whenever
the last agent was 'sisyphus' and the boulder state had agent='atlas'
set explicitly. The allowSisyphusWhenDefaultAtlas guard required
boulderAgentWasNotExplicitlySet=true, but start-work-hook.ts always
calls createBoulderState(..., 'atlas') which sets the agent explicitly.
This created a chicken-and-egg deadlock: boulder continuation needs
atlas to be the last agent, but the continuation itself is what switches
to atlas. With /start-work, the first iteration was always blocked.
Fix: drop the boulderAgentWasNotExplicitlySet constraint so Sisyphus is
always allowed when the boulder targets atlas (whether explicit or default).
Also reduce todo-continuation-enforcer CONTINUATION_COOLDOWN_MS from
30s to 5s to match atlas hook cooldown and recover interruptions faster.
Hephaestus requires GPT models, which can be provided by github-copilot.
The requiresProvider list was missing github-copilot, causing hephaestus
to not be created when github-copilot was the only GPT provider connected.
This also fixes a flaky CI test that documented this expected behavior.
Hephaestus requires GPT models, which can be provided by github-copilot.
The requiresProvider list was missing github-copilot, causing hephaestus
to not be created when github-copilot was the only GPT provider connected.
This also fixes a flaky CI test that documented this expected behavior.
Hephaestus requires GPT models, which can be provided by github-copilot.
The requiresProvider list was missing github-copilot, causing hephaestus
to not be created when github-copilot was the only GPT provider connected.
This also fixes a flaky CI test that documented this expected behavior.
gpt-5.3-codex is not available on GitHub Copilot. The fallback chains
incorrectly listed github-copilot as a valid provider for this model,
causing the doctor to report 'configured model github-copilot/gpt-5.3-codex
is not valid' for Hephaestus agent.
Affected agents: hephaestus (requiresProvider + fallbackChain)
Affected categories: ultrabrain, deep, unspecified-low
Copilot users can still use Hephaestus via openai or opencode providers.
Fixes#2047
- Fix operator precedence bug in hasActiveWork boolean expression
- Reuse getMainSessionStatus result from watchdog to avoid duplicate API calls
- Add flag to only check secondary timeout once to avoid unnecessary API traffic
Implements fixes from issue #1880 and #1934 to prevent exit code 130 timeout in CI environments:
- Add lastEventTimestamp to EventState for tracking when events were last received
- Add event watchdog: if no events for 30s, verify session status via direct API call
- Add secondary timeout: after 60s without meaningful work events, check for active children/todos and assume work is in progress
This prevents the poll loop from waiting for full 600s timeout when:
1. Event stream drops silently (common in CI with network instability)
2. Main session delegates to children without producing meaningful work on main session
- Replace regex /^([A-Za-z_]+)#.../ with indexOf-based prefix check to catch
line-ref#VK and line.ref#VK style inputs that were previously giving generic errors
- Extract parseLineRefWithHint helper to eliminate duplicated try-catch in
validateLineRef and validateLineRefs
- Restore idempotency guard in appendWriteHashlineOutput using new output format
- Add tests for LINE42 extraction, line-ref hint, line.ref hint, and guard behavior
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- fix(hooks): skip todo continuation when agent has pending question (#1888)
Add pending-question-detection module that walks messages backwards
to detect unanswered question tool_use, preventing CONTINUATION_PROMPT
injection while awaiting user response.
- fix(config): allow custom agent names in disabled_agents (#1693)
Change disabled_agents schema from BuiltinAgentNameSchema to z.string()
and add filterDisabledAgents helper in agent-config-handler to filter
user, project, and plugin agents with case-insensitive matching.
- fix(agents): change primary agents mode to 'all' (#1891)
Update Sisyphus, Hephaestus, and Atlas agent modes from 'primary'
to 'all' so they are available for @mention routing and task()
delegation in addition to direct chat.
EventState interface gained new required fields; the inline literal in the
session.status test was missing them, causing type errors and runtime failures.
applyToolConfig() forcibly overrode the user's external_directory
permission to 'allow' by placing OMO defaults after the user config
spread. Reorder so defaults come first and user config spreads on
top, allowing users to set 'ask' or 'deny'. The task permission
remains forced to 'deny' after the spread for security.
Closes#1973
Replace full hashlined file content in write tool response with a simple
'File written successfully. N lines written.' summary to reduce context
bloat.
- Detect non-numeric prefixes (e.g., "LINE#HK", "POS#VK") and explain
that the prefix must be an actual line number, not literal text
- Add suggestLineForHash() that reverse-looks up a hash in file lines
to suggest the correct reference (e.g., Did you mean "1#HK"?)
- Unify error message format from "LINE#ID" to "{line_number}#{hash_id}"
matching the tool description convention
- Add 3 tests covering non-numeric prefix detection and hash suggestion
applySetLine, applyReplaceLines, applyInsertAfter, applyInsertBefore
were re-exported from both edit-operations.ts and index.ts but have no
external consumers — they are only used internally within the module.
Only applyHashlineEdits (the public API) remains exported.
Remove the old LINE:HEX (e.g. "42:ab") reference format support. All
refs now use LINE#ID format exclusively (e.g. "42#VK"). Also fixes
HASHLINE_OUTPUT_PATTERN to use | separator (was missed in PR #2079).
This function is no longer called from edit-operations.ts after the
op/pos/end/lines schema refactor in PR #2079. Remove the function
definition and its 3 dedicated test cases.
restoreIndentForPairedReplacement() and restoreLeadingIndent() unconditionally
restored original indentation when replacement had none, preventing intentional
indentation changes (e.g. removing a tab from '\t1절' to '1절'). Skip indent
restoration when trimmed content is identical, indicating a whitespace-only edit.
Change LINE#HASH:content format to LINE#HASH|content across the entire
codebase. The pipe separator is more visually distinct and avoids
conflicts with TypeScript colons in code content.
15 files updated: implementation, prompts, tests, and READMEs.
OpenCode updated its read tool output format — the <content> tag now shares
a line with the first content line (<content>1: content) with no newline.
The hook's exact indexOf('<content>') detection returned -1, causing all
read output to pass through unmodified (no hash anchors). This silently
disabled the entire hashline-edit workflow.
Fixes:
- Sub-bug 1: Use findIndex + startsWith instead of exact indexOf match
- Sub-bug 2: Extract inline content after <content> prefix as first line
- Sub-bug 3: Normalize open-tag line to bare tag in output (no duplicate)
Also adds backward compat for legacy <file> + 00001| pipe format.
Unify internal hashline edit handling around replace/append/prepend to remove legacy operation shapes. This keeps normalization, ordering, deduplication, execution, and tests aligned with the new op/pos/end/lines contract.
Move blocking/polling logic before full_session branch so that
block=true waits for task completion regardless of output format.
🤖 Generated with assistance of oh-my-opencode
Unify hashline_edit input with replace/append/prepend + pos/end/lines semantics so callers use a single stable shape. Add normalization coverage and refresh tool guidance/tests to reduce schema confusion and stale legacy payload usage.
- Add normalizeModelFormat() utility for string/object model handling
- Update subagent-resolver to handle both model formats
- Add explicitUserConfig flag to ModelResolutionResult
- Set explicitUserConfig: true when user model is found in pipeline
This fixes the issue where plugin-provided models fail cache validation
and fall through to random fallback models.
Keep current field names but accept a pi-style flexible edit payload that is normalized to concrete operations at execution time.
Response now follows concise update/move status with diff metadata retained, removing full-file hashline echo to reduce model feedback loops.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Switch line hashing to significance-aware seeding so meaningful lines stay stable across reflows while punctuation-only lines still disambiguate by line index.
Also narrow prefix stripping to hashline/diff patterns that reduce accidental content corruption during edit normalization.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add SDKMessage local interface for message type safety
Replace any lambda params and message casts with SDKMessage
Remove eslint-disable comments for no-explicit-any
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Remove @ts-ignore and eslint-disable comments from executor.ts and recovery-hook.ts
- Change client: any to client: Client with proper import
- Rename experimental to _experimental for unused parameter
- Remove @ts-ignore for ctx.client casts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Create dedicated Gemini ultrawork variant that enforces intent
classification as mandatory Step 0 before any action. Routes Gemini
models to the new variant via source-detector priority chain
(planner > GPT > Gemini > default). Includes anti-optimism checkpoint
and tool-call mandate sections tuned for Gemini's eager behavior.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Counter Gemini's tendency to skip Phase 0 intent classification by
injecting a mandatory self-check gate before tool calls. Includes
intent type classification, anti-skip mechanism, and common mistake
table showing wrong vs correct behavior per intent type.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Model fallback is now opt-in via `model_fallback: true` in plugin config,
matching the runtime-fallback pattern. Prevents unexpected automatic model
switching on API errors unless explicitly enabled.
- formatter.test.ts: use dynamic imports with cache-busting to avoid mock pollution from runner.test.ts; test real format output instead of dispatch mocking
- hook.test.ts: rewrite with proper branch coverage (7 tests), add success/guard/subagent paths
- background-update-check.test.ts: rewrite with 10 tests covering all branches (early returns, pinned versions, auto-update success/failure)
- directory-agents-injector/injector.test.ts: replace finder/storage mocks with real filesystem + temp directories, verify actual AGENTS.md injection content
- directory-readme-injector/injector.test.ts: same pattern as agents-injector but for README.md, verifies root inclusion behavior
Remove blocking logic that prevented writes to files outside the
session directory. The guard now only applies to files within the
session directory, allowing free writes to external paths.
- Remove OUTSIDE_SESSION_MESSAGE constant
- Update test to expect outside writes to be allowed
- Add early return for paths outside session directory
- Keep isPathInsideDirectory for session boundary check
TDD cycle:
1. RED: Update test expectation
2. GREEN: Implement early return for outside paths
3. REFACTOR: Clean up unused constants
The extracted handleSessionIdleBackgroundEvent was never imported by
manager.ts — dead code from incomplete refactoring (d53bcfbc). Replace
the inline session.idle handler (58 LOC) with a call to the extracted
function, remove unused MIN_IDLE_TIME_MS import, and add 13 unit tests
covering all edge cases.
Add 'gemini' to prompt source types and route Gemini models to new
Gemini-optimized prompts via isGeminiModel detection. Update barrel
exports for all 3 agent modules. All existing tests pass.
- Track matchedLen separately for stripped continuation token matches
- Map fuzzy index back to original string position via character-by-character
scan that skips operator chars, fixing positional correctness
- maybeExpandSingleLineMerge now uses stripTrailingContinuationTokens and
stripMergeOperatorChars as fallback matching strategies
- Add 'refs interpreted against last read' atomicity clause to tool description
- Add 'output tool calls only; no prose' rule to tool description
- Rewrite restoreOldWrappedLines to use oh-my-pi's span-scanning algorithm
- Add stripTrailingContinuationTokens and stripMergeOperatorChars helpers
- Fix detectLineEnding to use first-occurrence logic instead of any-match
- Fix applyAppend/applyPrepend to replace empty-line placeholder in empty files
- Enhance tool description with 7 critical rules, tag guidance, and anti-patterns
- Revert getMessageDir to original join(MESSAGE_STORAGE, sessionID) behavior
- Fix dead subagentSessions.delete by capturing previousSessionID before tryFallbackRetry
- Add .unref() to process cleanup setTimeout to prevent 6s hang on Ctrl-C
- Add missing isUnstableAgent to fallback retry input mapping
- Fix process-cleanup tests to use exit listener instead of SIGINT at index 0
- Swap test filenames in compaction-aware-message-resolver to exercise skip logic correctly
- Use detached process group (non-Windows) + process.kill(-pid) to kill
the entire process tree, not just the outer shell wrapper
- Add proc.stdin error listener to absorb EPIPE when child exits before
stdin write completes
- code ?? 0 → code ?? 1: signal-terminated processes return null exit code,
which was incorrectly coerced to 0 (success) instead of 1 (failure)
- wrap proc.kill(SIGTERM) in try/catch to match SIGKILL guard and prevent
EPERM/ESRCH from crashing on already-dead processes
Add buildDeepParallelSection() function that injects guidance for non-Claude
models on parallel deep agent delegation:
- Detect when model is non-Claude and 'deep' category is available
- Inject instructions to decompose tasks and delegate to deep agents in parallel
- Give goals, not step-by-step instructions to deep agents
- Update Sisyphus prompt builder to pass model and call new function
This helps GPT-based Sisyphus instances leverage deep agents more effectively
for complex implementation tasks.
🤖 Generated with assistance of OhMyOpenCode
Completely restructure the documentation to explain model-agent matching
through the "Models Are Developers" lens:
- Add narrative sections on Sisyphus (sociable lead) and Hephaestus (deep specialist)
- Explain Claude vs GPT thinking differences (mechanics vs principles)
- Reorganize agent profiles by personality type (communicators, specialists, utilities)
- Simplify model families section
- Add "About Free-Tier Fallbacks" section
- Move example configuration to customization section
This makes the guide more conceptual and memorable for users customizing
agent models.
🤖 Generated with assistance of OhMyOpenCode
Problem: Agents frequently omit both 'category' and 'subagent_type' parameters
when calling the task() tool, causing validation failures. The JSON Schema
marks both as optional, and LLMs follow schema structure over description text.
Solution (Option A): Add aggressive visual warnings and failure-mode examples
to the tool description:
- ⚠️ CRITICAL warning header
- COMMON MISTAKE example showing what will FAIL
- CORRECT examples for both category and subagent_type usage
- Clear explanation that ONE must be provided
Tests: All 153 existing tests pass (no behavior change, only prompt improvement)
Connect the existing plugin loader infrastructure to both slash command
dispatch paths (executor and slashcommand tool), enabling namespaced
commands like /daplug:run-prompt to resolve and execute.
- Add plugin discovery to executor.ts discoverAllCommands()
- Add plugin discovery to command-discovery.ts discoverCommandsSync()
- Add "plugin" to CommandScope type
- Remove blanket colon-rejection error (replaced with standard not-found)
- Update slash command regex to accept namespaced commands
- Thread claude_code.plugins config toggle through dispatch chain
- Add unit tests for plugin command discovery and dispatch
Closes#2019🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Codex <noreply@openai.com>
- Add Quick Start section with clear installation link
- Add 'How It Works: Agent Orchestration' section linking to orchestration.md
- Add 'Agent Model Matching' section with JSON configuration examples
- Restructure content flow for better readability
- Add example JSON config to agent-model-matching.md
- Maintain original voice and strong opinions while improving organization
- All links now properly reference related docs
Update README with Anthropic blocking mention and revised model descriptions.
Fix markdown table alignment in both README and installation guide.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Injects a reminder after 10 tool turns without task tool usage. Tracks
per-session counters and cleans up on session deletion.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Restore docs/guide/agent-model-matching.md that was accidentally deleted
in commit 880c5e3b (docs restructure). Updated broken links to point to
current documentation structure.
- Remove runtime-fallback gate from session.status retry handler — runtime-fallback
has no session.status handler, so gating it causes retry signals to be silently dropped
- Fix background_output full_session arg description: default is true, not false
- Remove leading ./ from bin entry (npm strips invalid paths)
- Write schema to dist/ for export map compatibility (keep assets/ for GitHub URL)
- Remove unused codex dep + bump @modelcontextprotocol/sdk to ^1.25.2
- Fix broken relative link in configuration.md (../guide/installation.md)
🤖 Generated with assistance of OhMyOpenCode (https://github.com/code-yeongyu/oh-my-opencode)
- runtime-fallback: guard session.error with sessionRetryInFlight to prevent
double-advance during active retry; expand session.stop abort to include
sessionAwaitingFallbackResult; remove premature pendingFallbackModel clearing
from auto-retry finally block
- hashline-edit: add HASHLINE_LEGACY_REF_PATTERN for backward-compatible
LINE:HEX dual-parse in parseLineRef and normalizeLineRef
- tmux-subagent: defer session on null queryWindowState; unconditionally
re-queue deferred session on spawn failure (not just close+spawn)
- ultrawork-db: wrap new Database(dbPath) in try/catch to handle corrupted DB
- event: add try/catch guards around model-fallback logic in message.updated,
session.status, and session.error handlers
Prevent cross-file mock.module leakage by restoring Bun mocks after recovery-hook test, so executor tests always run against the real module implementation.
Stop patching global timers in every lock-management test. Use scoped fake timers only in continuation tests so lock/notification assertions remain deterministic in CI.
Remove duplicated dispatchToHooks declaration that broke TypeScript parsing, and isolate chat-headers tests from marker cache collisions with unique message IDs.
Closes#1901
Add 'default_strategy' config option (default: 'continue') to control whether ralph-loop creates a new session per iteration ('reset') or keeps the same session ('continue'). The 'reset' strategy keeps the model in the smart zone by starting with fresh context for each iteration.
Supports --strategy flag for per-command override.
recovery-hook.test.ts uses mock.module() at top level which patches the
executor module in the shared bun module cache. When run in the same
batch as executor.test.ts, executeCompact becomes the mocked no-op version,
causing all lock management tests to fail.
Move it to the isolated step (each file gets its own bun process) and
enumerate the remaining anthropic-context-window-limit-recovery test files
explicitly to avoid including recovery-hook.test.ts in the batch.
recovery-hook.test.ts uses mock.module() at top level which patches the
executor module in the shared bun module cache. When run in the same
batch as executor.test.ts, executeCompact becomes the mocked no-op version,
causing all lock management tests to fail.
Move it to the isolated step (each file gets its own bun process) and
enumerate the remaining anthropic-context-window-limit-recovery test files
explicitly to avoid including recovery-hook.test.ts in the batch.
recovery-hook.test.ts uses mock.module() at top level which patches the
executor module in the shared bun module cache. When run in the same
batch as executor.test.ts, executeCompact becomes the mocked no-op version,
causing all lock management tests to fail.
Move it to the isolated step (each file gets its own bun process) and
enumerate the remaining anthropic-context-window-limit-recovery test files
explicitly to avoid including recovery-hook.test.ts in the batch.
Replace exact call count assertions with delta-based checks:
- capture errorSpy.mock.calls.length before processing events
- slice to only check calls made during this test's execution
- use try/finally to guarantee mockRestore() even on assertion failure
This prevents test pollution from cross-file spy leakage in CI batch runs.
Replace exact call count assertions with delta-based checks:
- capture errorSpy.mock.calls.length before processing events
- slice to only check calls made during this test's execution
- use try/finally to guarantee mockRestore() even on assertion failure
This prevents test pollution from cross-file spy leakage in CI batch runs.
Replace exact call count assertions with delta-based checks:
- capture errorSpy.mock.calls.length before processing events
- slice to only check calls made during this test's execution
- use try/finally to guarantee mockRestore() even on assertion failure
This prevents test pollution from cross-file spy leakage in CI batch runs.
Complete the integration of multi-model future vision into all 4 READMEs:
- Korean: Full tagline with orchestration, cheaper/smarter models, open market
- Chinese: Full tagline with model orchestration and future betting
All READMEs now consistently convey: we ride all models, not just Claude.
The future is multi-model orchestration, not picking one winner.
Remove separate 'The Bet' sections and weave the multi-model future
vision directly into the existing 'steroids/prison' taglines:
- English: Expanded tagline with model orchestration and future betting
- Korean: '우리가 보는 미래' 섹션 제거, 태그라인에 통합
- Japanese: '私たちが賭ける未来' セクション削除、タグライン統合
- Chinese: '我们押注的未来' 部分删除,整合到标语中
Key message: Models get cheaper/smarter every month. No provider
will dominate. We ride them all. Built for the open market.
Add 'The Bet' section to all 4 language READMEs (en, ko, ja, zh-cn):
- Models getting cheaper every month
- Models getting smarter every month
- No single provider will dominate the future
- We leverage ALL models, not just Claude
- Architecture gets more valuable as models specialize
- We're building for the open multi-model future
Also update overview.md to move 'Better Than Pure Codex' into
Hephaestus section and add 'Better Than Pure Claude Code' section
with fundamental multi-model advantage explanation.
Fixes#1920
Installer-written exact versions (e.g., oh-my-opencode@3.5.2) were incorrectly treated as user-pinned, blocking auto-updates for all installer users.
Fix isPinned to only block auto-update when pinnedVersion is an explicit semver string (user's intent). Channel tags (latest, beta, next) and bare package name all allow auto-update.
Fix installer fallback to return bare PACKAGE_NAME for stable versions and PACKAGE_NAME@{channel} for prerelease versions, preserving channel tracking.
Fixes#1804, fixes#1962
The migration entry 'gpt-5.2-codex → gpt-5.3-codex' caused the plugin to silently overwrite user configs on every startup with a model that doesn't exist in the OpenAI API. Users explicitly setting gpt-5.2-codex (the correct current model) were forced to revert their config manually every session.
Rewrote prose for SF engineer voice: shorter sentences, punchier copy,
no em-dashes. Kept all structural elements, badges, testimonials, and
code blocks unchanged.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
- Use tool.schema.enum() for output_mode instead of generic string()
- Remove unsafe type assertion for output_mode
- Fix files_with_matches mode returning empty results by adding
filesOnly flag to parseOutput for --files-with-matches rg output
- Add --threads=4 flag to all rg invocations (grep and glob)
- Add global semaphore limiting concurrent rg processes to 2
- Reduce grep timeout from 300s to 60s (matches tool description)
- Reduce max output from 10MB to 256KB (prevents excessive memory usage)
- Add output_mode parameter (content/files_with_matches/count)
- Add head_limit parameter for incremental result fetching
Closes#2008
Ref: #674, #1722
- BUG-7: Add gpt-5-nano as final fallback in multimodal-looker model requirements
- BUG-14: Remove hardcoded LIBRARIAN_MODEL, let librarian resolve through normal fallback chain
- Update snapshots and tests to reflect new fallback behavior
When browserProvider is not set, agent-browser skill should NOT resolve.
Test assertions were inverted — expected 'Skills not found' but asserted the opposite.
Unhandled SQLite exceptions (SQLITE_BUSY, database locked, etc.) in queueMicrotask/setTimeout callbacks could crash the entire process. Added try/catch/finally to ensure db.close() is always called and errors are logged instead of crashing.
Split 1021-line index.ts into 10 focused modules per project conventions.
New structure:
- error-classifier.ts: error analysis with dynamic status code extraction
- agent-resolver.ts: agent detection utilities
- fallback-state.ts: state management and cooldown logic
- fallback-models.ts: model resolution from config
- auto-retry.ts: retry helpers with mutual recursion support
- event-handler.ts: session lifecycle events
- message-update-handler.ts: message.updated event handling
- chat-message-handler.ts: chat message interception
- hook.ts: main factory with proper cleanup
- types.ts: updated with HookDeps interface
- index.ts: 2-line barrel re-export
Embedded fixes:
- Fix setInterval leak with .unref()
- Replace require() with ESM import
- Add log warning on invalid model format
- Update sessionLastAccess on normal traffic
- Make extractStatusCode dynamic from config
- Remove unused SessionErrorInfo type
All 61 tests pass without modification.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Remove duplicate Runtime Fallback section from configurations.md.
Fix max_fallback_attempts range from (1-10) to (1-20) to match schema.
Update retry_on_errors default to include 400 status code.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Revert test name and assertion to original behavior per PR review feedback.
The test now correctly expects Atlas to respect uiSelectedModel instead of using its own fallback chain.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Make provider auto-retry signal detection respect timeout_seconds setting:
- When timeout_seconds=0, disable quota-based fallback escalation
- Only treat auto-retry signals as errors when timeout is enabled
- Add test to verify behavior when timeout_seconds is disabled
- Update documentation to explain timeout_seconds=0 behavior
This allows users to disable timeout-based fallbacks while keeping
error-based fallback functionality intact.
Refactor retry signal detection to be provider-agnostic:
- Replace hardcoded Copilot/OpenAI checks with generic pattern matching
- Detect any provider message containing limit/quota keywords + [retrying in X]
- Add OpenAI pattern: 'usage limit has been reached [retrying in X]'
- Update logging to use generic 'provider' instead of specific names
- Add 'usage limit has been reached' to RETRYABLE_ERROR_PATTERNS
This enables fallback escalation for any provider that signals automatic
retries due to quota/rate limits, not just Copilot and OpenAI.
Closes PR discussion: generalize retry pattern detection
- Extract duplicated auto-retry logic (~40 lines each) from session.error and
message.updated handlers into shared autoRetryWithFallback() helper
- Fix userFallbackModels path in model-resolution-pipeline to respect
constraints.connectedProviders parameter instead of reading cache directly,
matching the behavior of categoryDefaultModel and fallbackChain paths
Bug fixes:
1. extractStatusCode: handle nested data.statusCode (Anthropic error structure)
2. Error regex: relax credit.*balance.*too.*low pattern for multi-char gaps
3. Zod schema: bump max_fallback_attempts from 10 to 20 (config rejected silently)
4. getFallbackModelsForSession: fallback to sisyphus/any agent when session.error lacks agent
5. Model detection: derive model from agent config when session.error lacks model info
6. Auto-retry: resend last user message with fallback model via promptAsync
7. Persistent fallback: override model on every chat.message (not just pendingFallbackModel)
8. Manual model change: detect UI model changes and reset fallback state
9. Agent preservation: include agent in promptAsync body to prevent defaulting to sisyphus
Additional:
- Add sessionRetryInFlight guard to prevent double-retries
- Add resolveAgentForSession with 3-tier resolution (event → session memory → session ID)
- Add normalizeAgentName for display names like "Prometheus (Planner)" → "prometheus"
- Add resolveAgentForSessionFromContext to fetch agent from session messages
- Move AGENT_NAMES and agentPattern to module scope for reuse
- Register runtime-fallback hooks in event.ts and chat-message.ts
- Remove diagnostic debug logging from isRetryableError
- Add 400 to default retry_on_errors and credit/balance patterns to RETRYABLE_ERROR_PATTERNS
The \b word boundary regex treats '-' as a boundary, causing
'sisyphus-junior-session-123' to incorrectly match 'sisyphus'
instead of 'sisyphus-junior'.
Sorting agent names by length (descending) ensures longer names
are matched first, fixing the hyphenated agent detection issue.
Fixes cubic-dev-ai review issue #8
- Add normalizeFallbackModels helper to centralize string/array normalization (P3)
- Export RuntimeFallbackConfig and FallbackModels types from config/index.ts
- Fix agent detection regex to use word boundaries for sessionID matching
- Improve tests to verify actual fallback switching logic (not just log paths)
- Add SessionCategoryRegistry cleanup in executeSyncTask on completion/error (P2)
- All 24 runtime-fallback tests pass, 115 delegate-task tests pass
Replace word-boundary regex with stricter patterns that match
status codes only at start/end of string or surrounded by whitespace.
Prevents false matches like '1429' or '4290'.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Implement full fallback_models support across all integration points:
1. Model Resolution Pipeline (src/shared/model-resolution-pipeline.ts)
- Add userFallbackModels to ModelResolutionRequest
- Process user fallback_models before hardcoded fallback chain
- Support both connected provider and availability checking modes
2. Agent Utils (src/agents/utils.ts)
- Update applyModelResolution to accept userFallbackModels
- Inject fallback_models for all builtin agents (sisyphus, oracle, etc.)
- Support both single string and array formats
3. Model Resolver (src/shared/model-resolver.ts)
- Add userFallbackModels to ExtendedModelResolutionInput type
- Pass through to resolveModelPipeline
4. Delegate Task Executor (src/tools/delegate-task/executor.ts)
- Extract category fallback_models configuration
- Pass to model resolution pipeline
- Register session category for runtime-fallback hook
5. Session Category Registry (src/shared/session-category-registry.ts)
- New module: maps sessionID -> category
- Used by runtime-fallback to lookup category fallback_models
- Auto-cleanup support
6. Runtime Fallback Hook (src/hooks/runtime-fallback/index.ts)
- Check SessionCategoryRegistry first for category fallback_models
- Fallback to agent-level configuration
- Import and use SessionCategoryRegistry
Test Results:
- runtime-fallback: 24/24 tests passing
- model-resolver: 46/46 tests passing
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
transformModelForProvider only handled github-copilot provider, leaving
google provider models untransformed. This caused ProviderModelNotFoundError
when google/gemini-3-flash was sent to the API (correct ID is
gemini-3-flash-preview).
Changes:
- Add google provider to transformModelForProvider with idempotent regex
negative lookahead to prevent double -preview suffix
- Fix category-default path in model-resolution-pipeline when
availableModels is empty but connected provider exists
- Fix getFirstFallbackModel first-run path that constructed raw model IDs
without transformation
- Fix github-copilot provider gemini transforms to also use idempotent
regex (was vulnerable to double-transform)
- Extract transformModelForProvider to shared module (single source of
truth, imported by cli and shared layers)
- Add 20 new test cases: unit tests for both providers, runtime
integration tests for category-default and fallback-chain paths,
double-transform prevention for both providers
transformModelForProvider only handled github-copilot provider, leaving
google provider models untransformed. This caused ProviderModelNotFoundError
when google/gemini-3-flash was sent to the API (correct ID is
gemini-3-flash-preview).
Add google provider block with -preview suffix guard to prevent double
transformation.
Reorder tool permission spread so getAgentToolRestrictions() comes
last, allowing agent-specific restrictions to override defaults.
Fixes all 3 sites: task-starter.ts (startTask), manager.ts (startTask
and resume paths).
Previously, defaults like call_omo_agent:true would stomp agent
restrictions (e.g., explore's call_omo_agent:false) due to JS
spread semantics.
- Add Category-level fallback_models support in getFallbackModelsForSession()
- Try agent-level fallback_models first
- Then try agent's category fallback_models
- Support all builtin agents including hephaestus, sisyphus-junior, build, plan
- Expand agent name recognition regex to include:
- hephaestus, sisyphus-junior, build, plan, multimodal-looker
- Add comprehensive test coverage (6 new tests, total 24):
- Model switching via chat.message hook
- Agent-level fallback_models configuration
- SessionID agent pattern detection
- Cooldown mechanism validation
- Max attempts limit enforcement
All 24 tests passing
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add configuration schemas for runtime model fallback feature:
- RuntimeFallbackConfigSchema with enabled, retry_on_errors,
max_fallback_attempts, cooldown_seconds, notify_on_fallback
- FallbackModelsSchema for init-time fallback model selection
- Add fallback_models to AgentOverrideConfigSchema and CategoryConfigSchema
- Export types and schemas from config/index.ts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The doctor's fix messages for outdated/mismatched plugin versions were
directing users to ~/.config/opencode with `bun update`, but OpenCode
loads plugins from its cache directory (~/.cache/opencode on Linux,
~/Library/Caches/opencode on macOS). Additionally, pinned versions in
the cache package.json make `bun update` a no-op — `bun add ...@latest`
is required.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- applyReplaceLines: use stripped array directly instead of restoreLeadingIndent
- applySetLine: keep restoreLeadingIndent (1:1 replacement needs indent preservation)
- Added test case for replace_lines preserving new line indentation
- All 3025 tests pass
🤖 Generated with OhMyOpenCode assistance
- keyword-detector: always set variant to 'max' when ultrawork/ulw keyword detected
- chat-message: remove variant resolution logic to passthrough TUI variant unchanged
- Tests updated to reflect new behavior
🤖 Generated with OhMyOpenCode assistance
Add dedicated '## Hashline Edit' section to configurations.md explaining the hash-anchored Edit tool, its default-on behavior, and how to disable it or its companion hooks. Update src/config/AGENTS.md to reflect hashline_edit moved out of experimental and into root schema (27 fields).
Move hashline_edit out of experimental so it is a stable top-level config with default-on runtime behavior and explicit disable support. Add migration and tests to preserve existing experimental.hashline_edit users without breaking configs.
Breaking Changes:
- Change hashline format from 'lineNum:hex|content' to 'lineNum#CID:content'
- Replace hex-based hashing (00-ff) with CID-based hashing (ZPMQVRWSNKTXJBYH nibbles)
- Simplify constants: HASH_DICT → NIBBLE_STR + HASHLINE_DICT
- Update patterns: HASHLINE_PATTERN → HASHLINE_REF_PATTERN + HASHLINE_OUTPUT_PATTERN
Benefits:
- More compact and memorable CID identifiers
- Better alignment with LSP line reference format (lineNum#ID)
- Improved error messages and diff metadata clarity
- Remove unused toHashlineContent from diff-enhancer hook
Updates:
- Refactor hash-computation for CID generation
- Update all diff-utils to use new format
- Update hook to use raw content instead of hashline format
- Update tests to match new expectations
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
- Add no-hephaestus-non-gpt to hook list for schema validation
- Add disable_omo_env to experimental features schema
- Sync schema with existing hook and feature implementations
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
background_output auto-enabled full_session when the task was still
running, returning the entire session transcript on every poll. When
the parent agent had no other work and polled in a tight loop, this
caused massive token waste because each response dumped thousands of
tokens into the conversation history.
Default full_session to false so running-task checks return a compact
status table (~200 tokens). Callers can still pass full_session=true
explicitly when they need the full transcript.
- Refactor test descriptions for clarity regarding the presence of <omo-env> in generated prompts.
- Ensure that when disable_omo_env is true, <omo-env> is omitted from the sisyphus prompt.
- Confirm that <omo-env> remains in the prompt when disable_omo_env is not specified.
- Added a new configuration option `disable_omo_env` to control the injection of the `<omo-env>` block in agent prompts.
- Updated relevant functions and tests to support this feature, ensuring that the environment context can be toggled on or off as needed.
- Enhanced documentation to reflect the new option and its implications for API cost and cache hit rates.
Apply the internal initiator marker to automated continuation, recovery, babysitter, stop-hook, and hook-message injections so Copilot attribution consistently sets x-initiator=agent for system-generated prompts.
The installation instructions were incorrectly placed:
- 'For Humans' had the curl command (agent behavior)
- 'For LLM Agents' had the copy-paste prompt (human action)
Now correctly:
- 'For Humans': Copy-paste prompt to give to LLM agent
- 'For LLM Agents': Fetch raw installation guide via curl
Fixed in all 4 language versions (EN, KO, JA, ZH-CN).
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Include line number in hash computation to ensure uniqueness
- Add explicit examples of WRONG vs CORRECT LINE:HASH format
- Clarify that hash must be hex characters (0-9, a-f only)
- Update tests to use dynamic hash computation
- Remove formatCustomSkillsBlock function (dead code)
- Remove unused truncateDescription import
- Update buildCategorySkillsDelegationGuide to compact format
- Update tests to match new compact output
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Merge <available_skills> and <available_commands> into single <available_items>
- Sort by priority: project > user > opencode > builtin
- List skills before commands
- Add priority documentation to description
- Add 5 tests for ordering and priority
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Align edit/write hashline handling with TUI expectations by preserving metadata through tool execution, keeping unified diff raw to avoid duplicated line numbers, and tightening read/write/edit outputs plus tests for reliable agent operation.
Add Step 0 intent extraction to counter GPT 5.2's conservative grounding bias:
- Map surface questions to true action intent (e.g., "Did you do X?" → do X now)
- Verbalization pattern: model must state intent before acting, creating commitment
- Turn-end self-check to prevent stopping after only talking about work
Prevents Hephaestus from answering questions then stopping when action is implied.
- Generate unified diff for TUI display via metadata.diff
- Support write tool in addition to edit tool
- Hashline-format before/after content in filediff metadata
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add comprehensive test coverage for:
- Hash consistency validation between Read tool output and Edit tool validateLineRef
- Injected content isolation to prevent hashifying non-file-content lines
- Footer messages and system reminders that should pass through unchanged
Tests ensure Read hook properly handles content boundaries and maintains
hash validity for Edit tool operations.
🤖 Generated with assistance of oh-my-opencode
- Add per-message ultrawork mode detection via keyword matching
- Implement deferred DB override strategy using microtask retry loop
- Fall back to setTimeout after 10 microtask retries for robustness
- Update agent configuration schema with ultrawork model/variant fields
- Integrate with chat.message hook to apply overrides on detection
- Add comprehensive tests for all override scenarios
- Generated schema includes ultrawork configuration
🤖 Generated with assistance of OhMyOpenCode (https://github.com/code-yeongyu/oh-my-opencode)
Capture file content before hashline edit execution and compute filediff
metadata after, enabling opencode TUI to render inline diffs for the
plugin's edit tool (which replaces the built-in EditTool).
Add alternative providers to free-tier and cross-provider models:
- k2p5: add friendli as alternative to kimi-for-coding
- kimi-k2.5-free, minimax-m2.5-free, big-pickle, gpt-5-nano: add opencode-zen-abuse
- grok-code-fast-1: add venice as alternative to github-copilot
- glm-5: add opencode as alternative to zai-coding-plan
Remove the automatic installation of opencode-antigravity-auth plugin
when users have Gemini configured. This change addresses several issues:
1. Antigravity plugin is causing Google account bans for users
2. Users are unaware the plugin was auto-installed
3. Google has built-in OAuth for Gemini that doesn't require third-party plugins
Users who need the antigravity plugin can manually add it to their
plugin configuration if desired.
Fixes issues with unexpected plugin installation and account safety.
Complete rewrite organized around model families, agent roles,
task categories, and selection priority rules.
- Model families: Claude-like (Kimi, GLM/Big Pickle), GPT,
different-behavior (Gemini, MiniMax), speed-focused (Grok, Spark)
- Agent roles: Claude-optimized, dual-prompt, GPT-native, utility
- gpt-5.3-codex-spark: extremely fast but compacts too aggressively
- Big Pickle = GLM 4.6
- Explicit guidance: do not upgrade utility agents to Opus
- opencode models / opencode auth login references at top
- Link to orchestration system guide for task categories
Expand model-specific prompt routing section with insights from
the actual Prometheus GPT prompt development session:
- Why Claude vs GPT models need fundamentally different prompts
- Principle-driven (GPT) vs mechanics-driven (Claude) approach
- "Decision Complete" concept from Codex Plan Mode
- Why more rules help Claude but hurt GPT (contradiction surface)
- Concrete size comparison (1100 lines Claude vs 300 lines GPT)
Rewrite agent-model-matching.md as a technical reference that:
- Documents actual fallback chains from model-requirements.ts
- Explains model-specific prompt routing (Prometheus/Atlas GPT detection)
- Covers safe vs dangerous model substitutions with rationale
- Includes task categories (visual-engineering, deep, quick, etc.)
- Guides agents on how to explain model choices to users
- Adds provider priority chain
Also update installation.md to reference the guide when users
want custom model configuration, with explanation of what is
safe to change and why.
- Add docs/guide/agent-model-matching.md with TL;DR table, detailed
breakdown per agent, configuration examples, decision tree, common
pitfalls, and default fallback chains
- Update README.md to reference the guide in TOC, Just Install This
section, and Features overview
Adds a regression test for the race where /stop-continuation fires after
handleSessionIdle passes the flag check but before injectContinuation runs.
Verifies no injection occurs when the flag becomes true mid-countdown.
When /stop-continuation is invoked during the 2s countdown, the stop flag
was never checked inside injectContinuation, so the injection would still
fire after the countdown elapsed.
Propagate isContinuationStopped from handleSessionIdle through startCountdown
into injectContinuation, where it is now re-checked before any API call.
Atlas and Sisyphus prompts instructed agents to use background_cancel(all=true)
before final answers. This destroys uncollected background task results and
contradicts existing NEVER directives in the Sisyphus prompt, causing agents
to lose explore/librarian outputs mid-session.
Replace with individual task cancellation pattern that preserves completed
task results while still cleaning up running disposable tasks.
- Change MIN_STABILIZATION_MS from 0 to 1_000 to prevent premature exits
- Coerce non-positive minStabilizationMs to default instead of treating as disabled
- Fix stabilization logic: track firstWorkTimestamp inside the meaningful-work branch
- Add tests for default stabilization behavior and zero-value coercion
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Forward session.deleted events to write-existing-file-guard so per-session read permissions are actually cleared in runtime.
Add plugin-level regression test to ensure event forwarding remains wired, alongside the expanded guard behavior and unit coverage.
Remove ultrawork-model-override hook and per-agent ultrawork model swap
config that relied on zen opencode.ai free tier (no longer functional).
Removed:
- src/hooks/ultrawork-model-override/ (hook, test, index)
- ultrawork field from AgentOverrideConfigSchema
- ultrawork-model-override from HookNameSchema
- UltraworkConfig type from model-fallback-types
- Non-max20 sonnet+ultrawork-opus codepath from model-fallback
- Claude subscription model table from installation docs
- All references in plugin-interface, create-session-hooks, schema.json
- Related test cases and updated snapshots
Per reviewer feedback (code-yeongyu), keep the 'skill' tool as the main
tool and merge slashcommand functionality INTO it, rather than the reverse.
Changes:
- skill/tools.ts: Add command discovery (discoverCommandsSync) support;
handle both SKILL.md skills and .omo/commands/ slash commands in a single
tool; show combined listing in tool description
- skill/types.ts: Add 'commands' option to SkillLoadOptions
- skill/constants.ts: Update description to mention both skills and commands
- plugin/tool-registry.ts: Replace createSlashcommandTool with createSkillTool;
register tool as 'skill' instead of 'slashcommand'
- tools/index.ts: Export createSkillTool instead of createSlashcommandTool
- plugin/tool-execute-before.ts: Update tool name checks from 'slashcommand'
to 'skill'; update arg name from 'command' to 'name'
- agents/dynamic-agent-prompt-builder.ts: Categorize 'skill' tool as 'command'
- tools/skill-mcp/tools.ts: Update hint message to reference 'skill' tool
- hooks/auto-slash-command/executor.ts: Update error message
The slashcommand/ module files are kept (they provide shared utilities used
by the skill tool), but the slashcommand tool itself is no longer registered.
- atlas hook: update verification reminder assertions to match new
4-phase QA system (MANDATORY -> PHASE 1/2, LIE -> LYING)
- auto-update-checker: add missing revertPinnedVersion mock export
to fix SyntaxError in background-update-check tests
Note: 4 auto-update-checker tests fail only when run alongside
checker.test.ts due to bun mock.module isolation issue (pre-existing
in v3.7.3, not a regression)
Rewrite Atlas GPT verification from a checklist to a 4-phase protocol:
Phase 1 (Read Code First), Phase 2 (Automated Checks), Phase 3 (Hands-On QA),
Phase 4 (Gate Decision). Hands-on QA is now mandatory for user-facing changes,
not 'if applicable'. Hook message reinforces subagent distrust and requires
actually running deliverables before proceeding to next task.
spyOn(console, 'log') accumulates calls across test files in bun:test.
Add mockClear() after spy creation to prevent cross-file contamination
when run in the same bun test batch as completion.test.ts.
When no agent panes exist, mainPane.width equals windowWidth, making
agentAreaWidth zero. The early return guard blocked initial pane creation
before the currentCount === 0 handler could execute.
Add currentCount > 0 condition so the guard only fires when agent panes
already exist, allowing the bootstrap handler to evaluate canSplitPane.
Closes#1939
Shorter hook name, disableable via disabled_hooks config, migration added
for backward compatibility. Also forces agent switch to Hephaestus on
Sisyphus + GPT detection. Docs updated with new hook name.
Use getAgentConfigKey() to normalize display names (e.g. 'Sisyphus (Ultraworker)')
back to config keys before comparison. Update toast to 10s duration with clearer
line-broken messaging.
Oracle, Librarian, Explore, Momus, and Metis could modify files via
apply_patch despite being read-only agents. Also fixed duplicate task
entries in Librarian and Explore restriction lists.
When port appears available but binding fails (race with another opencode
instance), retry on next available port (auto mode) or attach to existing
server (explicit port mode) instead of crashing with exit code 1.
Add newlines around agent header for visual separation, dim the thinking
label, and add trailing newline after think block close.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
Guard against duplicate tool header/output rendering when both tool.execute
and message.part.updated fire for the same tool, and prevent message counter
resets when message.updated fires multiple times for the same assistant message.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode) assistance
Replace the single-line "Thinking: <summary>" rendering with direct streaming
of reasoning tokens via writePaddedText. Removes maybePrintThinkingLine and
renderThinkingLine in favor of incremental output with dim styling.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Adds message.part.delta event handling for real-time streaming output,
reasoning/think block display with in-place updates, per-agent profile
colors, padded text output, and semantic tool headers with icons.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
Replace two separate triage skills with one unified skill using 'free' category
for all subagents. Action-oriented: auto-answer questions, analyze bugs,
merge safe PRs. All items tracked via TaskCreate, [sisyphus-bot] comment prefix.
Convert all markdown tables in Sisyphus and dynamic-agent-prompt-builder
to plain bullet lists for cleaner prompt rendering.
Add explicit Oracle safeguards:
- Hard Block: background_cancel(all=true) when Oracle running
- Hard Block: delivering final answer before collecting Oracle result
- Anti-Pattern: background_cancel(all=true) and skipping Oracle
- Oracle section: NEVER cancel, collect via background_output first
- Background Result Collection: split cancel/wait into separate steps
with explicit NEVER use background_cancel(all=true) instruction
- features.md: update explore primary model (grok-code-fast-1), fix all agent fallback chains
- configurations.md: add missing deep category, fix all agent/category provider chains, add hephaestus to available agents, update model names to match actual code
Agent keys are remapped to display names, so preserving `default_agent`
values could still select a missing key at runtime.
This regression surfaced after d94a739203 remapped `config.agent` keys
to display names without canonicalizing configured defaults.
Normalize configured `default_agent` through display-name mapping before
fallback logic and extend tests to cover canonical and display-name
inputs.
Remove applyLayout(select-layout main-vertical) call after spawn which
was destroying grid arrangements by forcing vertical stacking. Now only
enforceMainPaneWidth is called, preserving the grid created by manual
split directions. Also fix enforceMainPaneWidth to use config's
main_pane_size percentage instead of hardcoded 50%.
Agent area width now uses real mainPane.width instead of hardcoded 50%
ratio. Grid planning, split availability, and spawn target finding now
respect user's agent_pane_min_width config instead of hardcoded
MIN_PANE_WIDTH=52, enabling 2-column grid layouts on narrower terminals.
Extends conversion logic to handle Base64-encoded images (e.g., from clipboard).
Previously, unsupported formats like HEIC/RAW/PSD in Base64 form bypassed
the conversion check and caused failures at multimodal-looker agent.
Changes:
- Add convertBase64ImageToJpeg() function in image-converter.ts
- Save Base64 data to temp file, convert, read back as Base64
- Update tools.ts to check and convert Base64 images when needed
- Ensure proper cleanup of all temporary files
Testing:
- All tests pass (29/29)
- Verified with 1.7MB HEIC file converted from Base64
- Type checking passes
Adds automatic conversion of unsupported image formats (HEIC, HEIF, RAW, PSD)
to JPEG before sending to multimodal-looker agent.
Changes:
- Add image-converter.ts module with format detection and conversion
- Modify look_at tool to auto-convert unsupported formats
- Extend mime-type-inference.ts to support 15+ additional formats
- Use sips (macOS) and ImageMagick (Linux/Windows) for conversion
- Add proper cleanup of temporary files
Fixes#722
Testing:
- All existing tests pass (29/29)
- TypeScript type checking passes
- Verified HEIC to JPEG conversion on macOS
Use the configured agent pane width consistently in split target selection and avoid close+spawn churn by replacing the oldest pane when eviction is required.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Restore Assumptions Check and When to Challenge the User from Sisyphus intent gate
- Add proactive explore/librarian firing to CORRECT behavior list
- Strengthen parallel execution with GPT-5.2 tool_usage_rules (parallelize ALL independent calls)
- Embed reporting into each Execution Loop step (Tell user pattern)
- Strengthen Progress Updates with plain-language and WHY-not-just-WHAT guidance
- Add post-edit reporting to Output Contract and After Implementation
- Fix Output Contract preamble conflict (skip empty preambles, but DO report actions)
Add OPENCODE_CLI_RUN_MODE environment variable check to skip all startup
toasts and version checks when running in CLI mode. This prevents
notification spam during automated CLI run sessions.
Includes comprehensive test coverage for CLI run mode behavior.
🤖 Generated with OhMyOpenCode assistance
- Add directory parameter to session API calls (session.get, session.todo,
session.status, session.children)
- Improve agent resolver with display name support via agent-display-names
- Add tool execution visibility in event handlers with running/completed
status output
- Enhance poll-for-completion with main session status checking and
stabilization period handling
- Add normalizeSDKResponse import for consistent response handling
- Update types with Todo, ChildSession, and toast-related interfaces
🤖 Generated with OhMyOpenCode assistance
Add propertyNames: { type: "string" } to all object schemas with
additionalProperties to ensure proper JSON schema validation for
dynamic property keys.
🤖 Generated with OhMyOpenCode assistance
The experimental.session.compacting handler was not delegating to
claudeCodeHooks, making PreCompact hooks from .claude/settings.json
dead code. Also fixed premature early-return when compactionContextInjector
was null which would skip any subsequent hooks.
oh-my-opencode overwrote OpenCode's default_agent with sisyphus whenever
Sisyphus orchestration was enabled. This made explicit defaults like
Hephaestus ineffective and forced manual agent switching in new sessions.
Only assign sisyphus as default when default_agent is missing or blank,
and preserve existing configured values. Add tests for both preservation
and fallback behavior to prevent regressions.
- Move src/tools/session-manager to isolated test section
- Prevents mock.module() pollution across parallel test runs
- Fixes 4 flaky storage tests that failed in CI
- P2: Change replace edit sorting from POSITIVE_INFINITY to NEGATIVE_INFINITY
so replace edits run LAST after line-based edits, preventing line number
shifts that would invalidate subsequent anchors
- P3: Update tool description from SHA-256 to xxHash32 to match actual
implementation in hash-computation.ts
Category-based delegation (task(category='quick')) was broken because
SISYPHUS_JUNIOR_AGENT sent 'sisyphus-junior' to session.prompt but
config.agent keys are now display names ('Sisyphus-Junior').
- Use getAgentDisplayName() for SISYPHUS_JUNIOR_AGENT constant
- Replace hardcoded 'sisyphus-junior' strings in tools.ts with constant
- Update background-output local constants to use display names
- Add remapCommandAgentFields() to translate command agent fields
- Add raw-key fallback in tool-config-handler agentByKey()
Use display names as config.agent keys so opencode shows proper names in UI
(Tab/@ menu). Key remapping happens after all agents are assembled but before
reordering, via remapAgentKeysToDisplayNames().
- agent-config-handler: set default_agent to display name, add key remapping
- agent-key-remapper: new module to transform lowercase keys to display names
- agent-priority-order: CORE_AGENT_ORDER uses display names
- tool-config-handler: look up agents by config key via agentByKey() helper
Remove crash-causing name fields from 6 agent configs (sisyphus, hephaestus,
atlas, metis, momus, prometheus). The name field approach breaks opencode
because Agent.get(agent.name) uses name as lookup key.
Add getAgentConfigKey() to agent-display-names.ts for resolving display names
back to lowercase config keys (e.g. 'Atlas (Plan Executor)' -> 'atlas').
- getSdkMessages now handles both response.data and direct array
responses from SDK
- Consolidated getMessageDir: storage.ts now re-exports from shared
opencode-message-dir.ts (with path traversal guards)
- Replace all response.data ?? [] with (response.data ?? response)
pattern across 14 files to handle SDK array-shaped responses
- Normalize SDK parts in parts-reader.ts by injecting sessionID/
messageID before validation (P1: SDK parts lack these fields)
- Treat unknown part types as having content in
recover-empty-content-message-sdk.ts to prevent false placeholder
injection on image/file parts
- Replace local isRecord with shared import in parts-reader.ts
- P1: Use compacted timestamp check instead of nonexistent truncated
field in target-token-truncation.ts
- P1: Use defensive (response.data ?? response) pattern in
hook-message-injector/injector.ts to match codebase convention
- P2: Filter by tool type in countTruncatedResultsFromSDK to avoid
counting non-tool compacted parts
- P2: Treat thinking/meta-only messages as empty in both
empty-content-recovery-sdk.ts and message-builder.ts to align
SDK path with file-based logic
- Re-read messages from SDK after injectTextPartAsync to prevent stale
snapshot from causing duplicate placeholder injection (P2)
- Replace local isRecord with shared import from record-type-guard (P3)
- P2: treat unknown part types as non-content in message-builder messageHasContentFromSDK
- P3: reuse shared isRecord from record-type-guard.ts in opencode-http-api
Unknown part types should be treated as content (return true)
to match parity with the existing message-builder implementation.
Using continue would incorrectly mark messages with unknown part
types as empty, triggering false recovery.
- isTodo: allow optional id to match Todo interface, preventing
todos without ids from being silently dropped
- messageHasContentFromSDK: treat unknown part types as empty
(continue) instead of content (return true) for parity with
existing storage logic
- readMessagesFromSDK in recover-empty-content-message-sdk: wrap
SDK call in try/catch to prevent recovery from throwing
The previous commit incorrectly removed this function and its test
as dead code. While the local implementations in other files have
different return types (MessageData[], MessagePart[]) and cannot be
replaced by this shared version, the function is a valid tested
utility. Deleting tests is an anti-pattern in this project.
- Encode path segments with encodeURIComponent in HTTP API URLs
to prevent broken requests when IDs contain special characters
- Remove unused readMessagesFromSDK from messages-reader.ts
(production callers use local implementations; dead code)
- Reduce timeout from 500ms to 200ms to lower CI execution time
- Add 10ms margin to elapsed time check for scheduler variance
- Replace pc.dim() string matching with call count assertion
to avoid ANSI escape code mismatch on CI runners
- target-token-truncation: eliminate redundant SDK messages fetch by
extracting tool results from already-fetched toolPartsByKey map
- recover-thinking-block-order: wrap SDK message fetches in try/catch
so recovery continues gracefully on API errors
- thinking-strip: guard against missing part.id before calling
deletePart to prevent invalid HTTP requests
The messages() mock in 'session_id with background=false' test did not
filter by session ID, causing resolveParentContext's SDK calls for
parent-session to increment messagesCallCount. This inflated
anchorMessageCount to 4 (matching total messages), so the poll loop
could never detect new messages and always hit MAX_POLL_TIME_MS.
Fix: filter messages() mock by path.id so only target session
(ses_continue_test) increments the counter. Restore MAX_POLL_TIME_MS
from 8000 back to 2000.
- runner.test.ts: waitForEventProcessorShutdown timeout 50ms → 500ms
(50ms was consistently too tight for CI runners)
- tools.test.ts: MAX_POLL_TIME_MS 2000ms → 8000ms
(polling timed out at ~2009ms on CI due to resource contention)
sessionExists() previously returned unconditional true on SQLite,
preventing ralph-loop orphaned-session cleanup from triggering.
Now uses sdkClient.session.messages() to verify session actually
exists. Callers updated to await the async result.
Addresses Cubic review feedback on PR #1837.
- executeDeduplication: now async, reads messages from SDK on SQLite via
client.session.messages() instead of JSON file reads
- truncateToolOutputsByCallId: now async, uses truncateToolResultAsync()
HTTP PATCH on SQLite instead of file-based truncateToolResult()
- deduplication-recovery: passes client through to both functions
- recovery-hook: passes ctx.client to attemptDeduplicationRecovery
Removes the last intentional feature gap on SQLite backend — dynamic
context pruning (dedup + tool-output truncation) now works on both
JSON and SQLite storage backends.
sessionExists() relied on JSON message directories which don't exist on
SQLite. Return true on SQLite and let readSessionMessages() handle lookup.
Also add empty-messages fallback in session_read for graceful not-found.
On SQLite backend, readParts() returns [] since JSON files don't exist.
Add isSqliteBackend() branch that reads parts from SDK via
client.session.messages() when failedAssistantMsg.parts is empty.
Rewrite opencode-message-dir.test.ts to use real temp directories instead
of mocking node:fs/node:path. Rewrite opencode-storage-detection.test.ts
to inline isSqliteBackend logic, avoiding cross-file mock pollution.
Resolves all 195 bun test failures (195 → 0). Full suite: 2707 pass.
On machines running OpenCode beta (v1.1.53+) with SQLite backend,
getMessageDir() returns null because isSqliteBackend() returns true.
This caused all 15 message-storage-dependent tests to fail.
Fix: mock opencode-storage-detection to force JSON mode, and use
ses_ prefixed session IDs to match getMessageDir's validation.
- Create src/shared/opencode-storage-paths.ts with all 4 constants
- Update 4 previous declaration sites to import from shared file
- Update additional OPENCODE_STORAGE usages for consistency
- Re-export from src/shared/index.ts
- No duplicate constant declarations remain
- Use shared OPENCODE_STORAGE, MESSAGE_STORAGE, PART_STORAGE constants
- Make isCallerOrchestrator async with SDK fallback for beta
- Fix cache implementation using Symbol sentinel
- Update atlas hooks and sisyphus-junior-notepad to use async isCallerOrchestrator
- When deleting tasks, prefer matching by id if present
- Fall back to content matching only when todo has no id
- Prevents deleting unrelated todos with same subject
- Add content-based fallback matching for todos without ids
- Add TODO comment for exported but unused SDK functions
- Add resetStorageClient() for test isolation
- Fixes todo duplication risk on beta (SQLite backend)
- Add src/shared/opencode-storage-paths.ts with consolidated constants
- Update imports in hook-message-injector and session-manager
- Add src/shared/opencode-storage-detection.ts with isSqliteBackend()
- Add OPENCODE_SQLITE_VERSION constant
- Export all from shared/index.ts
- Make id field optional in all Todo interfaces (TodoInfo, Todo, TodoItem)
- Fix null-unsafe comparisons in todo-sync.ts to handle missing ids
- Add test case for todos without id field preservation
- All tests pass and typecheck clean
The polling loop in executeUnstableAgentTask only checked session status
and message stability, never checking if the background task itself had
been interrupted. This caused the tool call to hang until MAX_POLL_TIME_MS
(10 minutes) when a task was interrupted by prompt errors.
Add manager.getTask() check at each poll iteration to break immediately
on terminal statuses (interrupt, error, cancelled), returning a clear
failure message instead of hanging.
Reduce plan-template from 541 to 335 lines by removing redundant verbose
examples while recovering 3 lost context items: tool-type mapping table in
QA Policy, scenario specificity requirements (selectors/data/assertions/
timing/negative) in TODO template, and structured output format hints for
each Final Verification agent.
Strengthen TODO template to make QA scenarios non-optional with explicit
rejection warning. Add Final Verification Wave with 4 parallel review
agents: oracle (plan compliance audit), unspecified-high (code quality),
unspecified-high (real manual QA), deep (scope fidelity check) — each
with detailed verification steps and structured output format.
Add Maximum Parallelism Principle as a top-level constraint and replace
small-scale plan template examples (6 tasks, 3 waves) with production-scale
examples (24 tasks, 4 waves, max 7 concurrent) to steer the model toward
generating fine-grained, dependency-minimized plans by default.
TaskToastManager entries were never removed when tasks completed via
error, session deletion, stale pruning, or cancelled with
skipNotification. Ghost entries accumulated indefinitely, causing the
'Queued (N)' count in toast messages to grow without bound.
Added toastManager.removeTask() calls to all 4 missing cleanup paths:
- session.error handler
- session.deleted handler
- cancelTask with skipNotification
- pruneStaleTasksAndNotifications
Closes#1866
First message variant gate was unconditionally overwriting message.variant
with the fallback chain value (e.g. 'medium' for Hephaestus), ignoring
any variant the user had already selected via OpenCode UI.
Now checks message.variant === undefined before applying the resolved
variant, matching the behavior already used for subsequent messages.
Closes#1861
OpenCode 1.2.0+ changed reasoning-delta and text-delta to emit
'message.part.delta' instead of 'message.part.updated'. Without
handling this event, lastUpdate was only refreshed at reasoning-start
and reasoning-end, leaving a gap where extended thinking (>3min)
could trigger stale timeout.
Accept both event types as heartbeat sources for forward compatibility.
OpenCode uses 'busy'/'retry'/'idle' session statuses, not 'running'.
The stale timeout guard checked for type === 'running' which never
matched, leaving all background tasks vulnerable to stale-kill even
when their sessions were actively processing.
Change sessionIsRunning to check type !== 'idle' instead, protecting
busy and retrying sessions from premature termination.
The stale detection was checking lastUpdate timestamps BEFORE
consulting session.status(), causing tasks to be unfairly killed
after 3 minutes even when the session was actively running
(e.g., during long tool executions or extended thinking).
Changes:
- Reorder pollRunningTasks to fetch session.status() before stale check
- Skip stale-kill entirely when session status is 'running'
- Port no-lastUpdate handling from task-poller.ts into manager.ts
(previously manager silently skipped tasks without lastUpdate)
- Add sessionStatuses parameter to checkAndInterruptStaleTasks
- Add 7 new test cases covering session-status-aware stale detection
Both parent-session-notifier.ts and notify-parent-session.ts now include
parentTools in the promptAsync body, ensuring tool restrictions are
consistently applied across all notification code paths.
Pass parentTools from session-tools-store through the background task
lifecycle (launch → task → notify) so that when notifyParentSession
sends promptAsync, the original tool restrictions (e.g., question: false)
are preserved. This prevents the Question tool from re-enabling after
call_omo_agent background tasks complete.
Call setSessionTools(sessionID, tools) before every prompt dispatch so
the tools object is captured and available for later retrieval when
background tasks complete.
In-memory Map-based store that records tool restriction objects (e.g.,
question: false) by sessionID when prompts are sent. This enables
retrieving the original session's tool parameters when background tasks
complete and need to notify the parent session.
Add discoverProjectAgentsSkills() for project-level .agents/skills/ and
discoverGlobalAgentsSkills() for ~/.agents/skills/ — matching OpenCode's
native skill discovery paths (https://opencode.ai/docs/skills/).
Updated discoverAllSkills(), discoverSkills(), and createSkillContext()
to include these new sources with correct priority ordering.
Co-authored-by: dtateks <dtateks@users.noreply.github.com>
Closes#1818
The agent_pane_min_width config value was accepted in the schema and
passed as CapacityConfig.agentPaneWidth but never actually used — the
underscore-prefixed _config parameter in decideSpawnActions was unused,
and all split/capacity calculations used the hardcoded MIN_PANE_WIDTH.
Now decideSpawnActions, canSplitPane, isSplittableAtCount,
findMinimalEvictions, and calculateCapacity all accept and use the
configured minimum pane width, falling back to the default (52) when
not provided.
Closes#1781
Add test coverage for detectErrorType and extractMessageIndex with
edge cases: circular references, non-standard proxy errors, null input.
Wrap both functions in try/catch to prevent crashes from malformed
error objects returned by non-standard providers like Antigravity.
LLMs sometimes pass load_skills as a serialized JSON string instead
of an array. Add defensive JSON.parse before validation to handle
this gracefully.
Fixes#1701
Community-reported-by: @omarmciver
Z.ai GLM models don't support thinking/reasoning parameters.
Ensure these are omitted entirely to prevent empty responses.
Fixes#980
Community-reported-by: @iserifith
The isBinaryAvailableOnWindows() function used spawnSync("where")
which fails even when the binary IS on PATH, causing false negatives.
Removed the redundant pre-check and let nodeSpawn handle binary
resolution naturally with proper OS-level error messages.
Fixes#1805
The waitForEventProcessorShutdown test was comparing exact string match
against console.log spy, but picocolors wraps the message in ANSI dim
codes. On CI (bun 1.3.9) this caused the assertion to fail. Use
string containment check instead of exact argument match.
Tasks with no progress.lastUpdate were silently skipped in
checkAndInterruptStaleTasks, causing them to hang forever when the model
hangs before its first tool call. Now falls back to checking startedAt
against a configurable messageStalenessTimeoutMs (default: 10 minutes).
Closes#1769
loader.test.ts creates and deletes temp directories via process.chdir()
which causes 'current working directory was deleted' errors for subsequent
tests running in the same process. Move it to isolated step and enumerate
remaining skill-loader test files individually.
formatter.test.ts, format-default.test.ts, sync-executor.test.ts, and
session-creator.test.ts use mock.module() which pollutes bun's module
cache. Previously they ran both in the isolated step AND again in the
remaining tests step (via src/cli and src/tools wildcards), causing
cross-file contamination failures.
Now the remaining tests step enumerates subdirectories explicitly,
excluding the 4 mock-heavy files that are already run in isolation.
Reasoning/thinking models (Oracle, Claude Opus) were being killed by the
stale timeout because lastUpdate was only refreshed on tool-type events.
During extended thinking, no tool events fire, so after 3 minutes the
task was incorrectly marked as stale and aborted.
Move progress initialization and lastUpdate refresh before the tool-type
conditional so any message.part.updated event (text, thinking, tool)
keeps the task alive.
The non-interactive-env hook was prepending environment variables without checking
if the prefix was already applied to the command, causing duplication when multiple
git commands were executed in sequence.
This fix adds an idempotent check: if the command already starts with the env prefix,
the hook returns early without modification. This maintains the non-interactive behavior
while ensuring the operation is idempotent across multiple tool executions.
Remove boulder session restriction (f84ef532) and stagnation cap (10a60854)
that prevented continuation from firing in regular sessions.
Changes:
- Remove boulder/subagent session gate in idle-event.ts — continuation now
fires for ANY session with incomplete todos, as originally intended
- Remove stagnation cap (MAX_UNCHANGED_CYCLES) — agent must keep rolling
the boulder until all todos are complete, no giving up after 3 attempts
- Remove lastTodoHash and unchangedCycles from SessionState type
- Keep 30s cooldown (CONTINUATION_COOLDOWN_MS) as safety net against
re-injection loops
- Update tests: remove boulder gate tests, update stagnation test to verify
continuous injection, update non-main-session test to verify injection
42 tests pass, typecheck and build clean.
Records task history at 6 status transitions (pending, running×2, error,
cancelled, completed). Exports TaskHistory from background-agent barrel.
Passes backgroundManager and sessionID through compaction hook chain.
Adds §8 to compaction prompt instructing the LLM to preserve spawned agent
session IDs and resume them post-compaction instead of starting fresh.
Injects actual TaskHistory data when BackgroundManager is available.
In-memory tracker that survives BackgroundManager's cleanup cycles.
Records agent delegations with defensive copies, MAX 100 cap per parent,
undefined-safe upsert, and newline-sanitized formatForCompaction output.
Rename header to oMoMoMoMo Doctor to match installation guide branding.
Remove providers check entirely — no longer meaningful for diagnostics.
Fix comment-checker detection by resolving @code-yeongyu/comment-checker package path
in addition to PATH lookup.
When a custom agent's last assistant message contains only tool calls (no text/reasoning parts), the sync result fetcher returned empty content. Walk assistant messages newest-first to find the first one with actual text content.
Replace PreToolUse hook-based question tool blocking with the existing
tools parameter approach (tools: { question: false }) which physically
removes the tool from the LLM's toolset before inference.
The hook was redundant because every session.prompt() call already passes
question: false via the tools parameter. OpenCode converts this to a
PermissionNext deny rule and deletes the tool from the toolset, preventing
the LLM from even seeing it. The hook only fired after the LLM already
called the tool, wasting tokens.
Changes:
- Remove subagent-question-blocker hook invocation from PreToolUse chain
- Remove hook registration from create-session-hooks.ts
- Delete src/hooks/subagent-question-blocker/ directory (dead code)
- Remove hook from HookNameSchema and barrel export
- Fix sync-executor.ts missing question: false in tools parameter
- Add regression tests for both the removal and the tools parameter
- patch: ask user whether to add enhanced summary (skippable)
- minor/major: enhanced summary is now mandatory, not optional
- Update TODO descriptions and skip conditions accordingly
Extends the process.cwd() fix to cover all project-level loaders. In the desktop app, process.cwd() points to the app installation directory instead of the project directory, causing project-level agents, commands, and slash commands to not be discovered. Each function now accepts an optional directory parameter (defaulting to process.cwd() for backward compatibility) and callers pass ctx.directory from the plugin context.
OpenCode TUI reads input.subagent_type to display task type. When
subagent_type was missing (e.g., category-only or session continuation),
TUI showed 'Unknown Task'.
Fix:
- category provided: always set subagent_type to 'sisyphus-junior'
(previously only when subagent_type was absent)
- session_id continuation: resolve agent from session's first message
- fallback to 'continue' if session has no agent info
Project-level skills (.opencode/skills/ and .claude/skills/) were not
discovered in desktop app environments because the discover functions
hardcoded process.cwd() to resolve project paths. In desktop apps,
process.cwd() points to the app installation directory rather than the
user's project directory.
Add optional directory parameter to all project-level skill discovery
functions and thread ctx.directory from the plugin context through the
entire skill loading pipeline. Falls back to process.cwd() when
directory is not provided, preserving CLI compatibility.
isGptModel only matched openai/ and github-copilot/gpt- prefixes, causing
models like litellm/gpt-5.2 to fall into the Claude code path. This
injected Claude-specific thinking config, which the opencode runtime
translated into a reasoningSummary API parameter — rejected by OpenAI.
Extract model name after provider prefix and match against GPT model
name patterns (gpt-*, o1, o3, o4).
Closes#1788
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Port devxoul's PR #821 feature to current codebase structure.
Supports absolute, relative, ~/home paths with percent-encoding.
Gracefully handles malformed URIs and missing files with warnings.
Co-authored-by: devxoul <devxoul@gmail.com>
Commit 598a4389 refactored config-handler into separate modules but
dropped the disabledMcps parameter from loadMcpConfigs() and did not
handle the spread-order overwrite where .mcp.json MCPs (hardcoded
enabled:true) overwrote user's enabled:false from opencode.json.
Changes:
- Re-add disabledMcps parameter to loadMcpConfigs() in loader.ts
- Capture user's enabled:false MCPs before merge, restore after
- Pass disabled_mcps to loadMcpConfigs for .mcp.json filtering
- Delete disabled_mcps entries from final merged result
- Add 8 new tests covering both fixes
resolveSymlink and resolveSymlinkAsync incorrectly resolved relative
symlinks by using path.resolve(filePath, '..', linkTarget). This fails
when symlinks use multi-level relative paths (e.g. ../../skills/...) or
when symlinks are chained (symlink pointing to a directory containing
more symlinks).
Replace with fs.realpathSync/fs.realpath which delegates to the OS for
correct resolution of all symlink types: relative, absolute, chained,
and nested.
Fixes#1738
AI-assisted-by: claude-opus-4.6 via opencode
AI-contribution: partial
AI-session: 20260212-120629-4gTXvDGV
When delegate-task resumes a session via session_id, the response
task_metadata now includes a subagent field identifying which agent
was running in the resumed session. This allows the parent agent to
know what type of subagent it is continuing.
- sync-continuation: uses resumeAgent extracted from session messages
- background-continuation: uses task.agent from BackgroundTask object
- Gracefully omits subagent when agent info is unavailable
Previously, executeStopHooks returned immediately after the first hook
that produced valid JSON stdout, even if it was non-blocking. This
prevented subsequent hooks from executing.
This was problematic when users had multiple Stop hooks (e.g.,
check-console-log.js + task-complete-notify.sh in settings.json),
because the first hook's stdout (which echoed stdin data as JSON)
caused an early return, silently skipping all remaining hooks.
Now only explicitly blocking results (exit code 2 or decision=block)
cause an early return, matching Claude Code's behavior of executing
all Stop hooks sequentially.
Closes#1707
The SSE event stream subscription was missing the directory parameter,
causing the OpenCode server to only emit global events (heartbeat,
connected, toast) but not session-scoped events (session.idle,
session.status, tool.execute, message.updated, message.part.updated).
Without session events:
- hasReceivedMeaningfulWork stays false (no message/tool events)
- mainSessionIdle never updates (no session.idle/status events)
- pollForCompletion either hangs or exits for unrelated reasons
Fix: Pass { directory } to client.event.subscribe(), matching the
pattern already used by client.session.promptAsync().
Also adds a stabilization period (10s) after first meaningful work
as defense-in-depth against early exit race conditions.
Previously, a single validation error (e.g. wrong type for
prometheus.permission.edit) caused safeParse to fail and the
entire oh-my-opencode.json was silently replaced with {}.
Now loadConfigFromPath falls back to parseConfigPartially() which
validates each top-level key in isolation, keeps the sections that
pass, and logs which sections were skipped.
Closes#1767
The hook used exact string equality (agentName !== "prometheus") which fails
when display names like "Prometheus (Plan Builder)" are stored in session state.
Replace with case-insensitive substring matching via isPrometheusAgent() helper,
consistent with the pattern used in keyword-detector hook.
Closes#1764 (Bug 3)
- Detect namespaced commands (containing ':') from Claude marketplace plugins
- Provide clear error message explaining marketplace plugins are not supported
- Point users to .claude/commands/ as alternative for custom commands
- Fixes issue where /daplug:run-prompt gave ambiguous 'command not found'
Closes#1682
Replace path.startsWith('/') with path.isAbsolute() in directory
injector hooks. The startsWith('/') check only works on Unix-like
systems where absolute paths begin with '/'. On Windows, absolute
paths start with drive letters (e.g., C:\), causing resolveFilePath
to incorrectly treat them as relative and prepend the project
directory.
This follows the same pattern already used in
src/features/claude-tasks/storage.ts (commit 8e349aa).
Affected hooks:
- directory-agents-injector: AGENTS.md injection
- directory-readme-injector: README.md injection
MCP tool responses can have undefined output.output, causing TypeError
crashes in tool.execute.after hooks.
Changes:
- comment-checker/hook.ts: guard output.output with ?? '' before toLowerCase()
- edit-error-recovery/hook.ts: guard output.output with ?? '' before toLowerCase()
- task-resume-info/hook.ts: extract output.output ?? '' into outputText before all string operations
- Added tests for undefined output.output in edit-error-recovery and task-resume-info
When a user pins oh-my-opencode to a specific version (e.g., oh-my-opencode@3.4.0),
the auto-update checker now respects that choice and only shows a notification toast
instead of overwriting the pinned version with latest.
- Skip updatePinnedVersion() when pluginInfo.isPinned is true
- Show update-available toast only (notification, no modification)
- Added comprehensive tests for pinned/unpinned/autoUpdate scenarios
Fixes#1745
MCP tools can return non-string results (e.g. structured JSON objects).
When this happens, output.output is undefined, causing TypeError crashes
in edit-error-recovery and delegate-task-retry hooks that call methods
like .toLowerCase() without checking the type first.
Add typeof string guard in both hooks, consistent with the existing
pattern used in tool-output-truncator.
Allow individual categories to be disabled via `disable: true` in
config. Introduce shared `mergeCategories()` utility to centralize
category merging and disabled filtering across all 7 consumption sites.
When bun install fails after updating the config pin, the config now shows the
new version but the actual package is the old one. Add revertPinnedVersion() to
roll back the config entry on install failure, keeping config and installed
version in sync.
Ref #1472
Child session creation was injecting permission: { question: 'deny' } which
conflicted with OpenCode's child session permission handling, causing subagent
sessions to hang with 0 messages after creation (zombie state).
Remove the permission override from all session creators (BackgroundManager,
sync-session-creator, call-omo-agent) and rely on prompt-level tool restrictions
(tools.question=false) to maintain the intended policy.
Closes#1711
Instructs the orchestrator to read subagent notepad files
(.sisyphus/notepads/{planName}/) after task completion, ensuring
learnings, issues, and problems are propagated to subsequent delegations.
boulderState?.session_ids.includes() only guards boulderState, not
session_ids. If boulder.json is corrupted or missing the field,
session_ids is undefined and .includes() crashes silently, losing
subagent results.
Changes:
- readBoulderState: validate parsed JSON is object, default session_ids to []
- atlas hook line 427: boulderState?.session_ids?.includes
- atlas hook line 655: boulderState?.session_ids?.includes
- prometheus-md-only line 93: boulderState?.session_ids?.includes
- appendSessionId: guard with ?. and initialize to [] if missing
Fixes#1672
GitHub Web UI commits have web-flow as the author/committer,
causing CLA checks to fail even after the contributor signs.
Adding web-flow to the allowlist resolves this for all
contributors who edit files via the GitHub web interface.
df0b9f76 regressed look_at from synchronous prompt (session.prompt) to
async prompt (session.promptAsync) + pollSessionUntilIdle polling. This
introduced a race condition where the poller fires before the server
registers the session as busy, causing it to return immediately with no
messages available.
Fix: restore promptSyncWithModelSuggestionRetry (blocking HTTP call) and
remove polling entirely. Catch prompt errors gracefully and still attempt
to fetch messages, since session.prompt may throw even on success.
Closes#1716
## Summary
- Added 4 tests for disabled_agents validation in call_omo_agent tool
- Tests verify agent rejection when in disabled_agents list
- Tests verify case-insensitive matching
- Tests verify agents not in disabled list are allowed
- Tests verify empty disabled_agents allows all agents
Migration changes were only applied to rawConfig if file write succeeded,
leaving the running process on stale config. Also stops logging backup
path when the backup copy itself failed.
Reasoning parts could contain completion-like text triggering false
positives. Also handles session.messages returning either an array
or {data: [...]} shape.
DEFAULT_TIMEOUT_MS was 0 (no timeout), causing opencode run to hang forever
if the session never completed. Also attached .catch() to processEvents()
immediately to prevent unhandled promise rejections before Promise.race.
Main session was unconditionally allowed through the boulder session guard,
causing continuation injection into sessions not part of the active boulder.
Now only sessions explicitly in boulder's session_ids (or background tasks)
receive boulder continuation, matching todo-continuation-enforcer behavior.
- fix(background): include "interrupt" status in all terminal status checks (3 files)
- fix(background): display "INTERRUPTED" instead of "CANCELLED" for interrupted tasks
- fix(cli): add error recovery grace period in poll-for-completion
- fix(lsp): use JSONC parser for config loading to support comments
All changes verified with tests and typecheck.
- fix(delegate-task): return error on poll timeout instead of silent null
- fix(delegate-task): ensure toast and session cleanup on all error paths with try/finally
- fix(delegate-task): apply agent tool restrictions in sync-prompt-sender
- fix(plugin): add symmetric idle dedup to prevent double hook triggers
- fix(cli): replace regex-based JSONC editing with jsonc-parser in auth-plugins
- fix(cli): abort event stream after completion and restore no-timeout default
All changes verified with tests and typecheck.
- Add comprehensive TODO SYNC section documenting automatic
bidirectional sync between tasks and OpenCode todo system
- Improve sync-continuation.test.ts with proper mock modules
for pollSyncSession and fetchSyncResult dependencies
Replace unified 'task' tool documentation with 4 individual tools:
- task_create: Create task with auto-generated T-{uuid} ID
- task_list: List active tasks with summary
- task_get: Retrieve full task object by ID
- task_update: Update task fields with dependency support
Add detailed TASK TOOLS section with args tables and usage examples.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Add tests for sync-continuation error paths and toast cleanup
- Add tests for sync-result-fetcher with anchor message support
- Expand sync-session-poller tests for edge cases and completion detection
- Add bulk cleanup test for recent-synthetic-idles
- Add proper error handling in executeSyncContinuation with try-catch blocks
- Ensure toast cleanup happens in all error paths via finally block
- Add anchorMessageCount tracking for accurate result fetching after continuation
- Improve fetchSyncResult to filter messages after anchor point
- Add silent failure detection when no new response is generated
- VERIFICATION_REMINDER: add Step 2 manual code review (non-negotiable)
- Require Read of EVERY changed file line by line
- Cross-check subagent claims vs actual code
- Verify logic correctness, completeness, edge cases, patterns
- Add Step 5: direct boulder state check via Read plan file
- Count remaining tasks directly, no cached state
- BOULDER_CONTINUATION_PROMPT: add first rule to read plan file immediately
- verification-reminders.ts: restructure steps 5-8 for boulder/todo checks
- Atlas default.ts (Claude): enhance 3.4 QA with A/B/C/D sections
- A: Automated verification
- B: Manual code review (non-negotiable)
- C: Hands-on QA (if applicable)
- D: Check boulder state directly
- Atlas gpt.ts (GPT-5.2): apply same QA enhancements with GPT-optimized structure
- verification_rules: update both Claude and GPT versions with manual review requirements
Addresses issue where Atlas would skip manual code inspection after delegation,
leading to rubber-stamping of broken or incomplete work.
The todo-continuation-enforcer was firing boulder continuation in ALL main
sessions with incomplete todos, regardless of whether /start-work was ever
executed. This caused unwanted BOULDER CONTINUATION directives in sessions
that never invoked /start-work.
Changes:
- Add readBoulderState check in idle-event.ts to verify session is registered
in boulder.json's session_ids array
- Change filter condition from main session check to boulder session check
- Add 4 new test cases for boulder session gate behavior
- Update all existing 41 tests to set up boulder state appropriately
Now boulder continuation only fires when:
1. Session is in boulder.json's session_ids (/start-work was executed), OR
2. Session is a background task session (subagent)
TDD cycle:
- RED: 2 new tests failed as expected (no boulder check in implementation)
- GREEN: Implementation added, all 41 tests pass
- REFACTOR: Full test suite 2513 pass, typecheck & build clean
Expand prompt structure comment to 4-field format (CONTEXT/GOAL/DOWNSTREAM/REQUEST).
Update all explore/librarian task() examples across Sisyphus, Hephaestus,
Prometheus interview-mode, and both ultrawork variants with richer context
including downstream usage, scope limits, and return format expectations.
OpenCode SDK does not expose client.model.list API. This caused the
provider-models cache to always be empty (models: {}), which in turn
caused delegate-task categories with requiresModel (e.g., 'deep',
'artistry') to fail with misleading 'Unknown category' errors.
Changes:
- connected-providers-cache.ts: Extract models from provider.list()
response's .all array instead of calling non-existent client.model.list
- category-resolver.ts: Distinguish between 'unknown category' and
'model not available' errors with clearer error messages
- Add comprehensive tests for both fixes
Bug chain:
client.model?.list is undefined -> empty cache -> isModelAvailable
returns false for requiresModel categories -> null returned from
resolveCategoryConfig -> 'Unknown category' error (wrong message)
ast-grep CLI silently ignores --update-all when --json=compact is
present, causing replace operations to report success while never
modifying files. Split into two separate CLI invocations.
Add session-status-normalizer to handle session.status events and
convert idle status to synthetic session.idle events. Includes
deduplication logic to prevent duplicate idle events within 500ms.
When boulderState.agent is not explicitly set (defaults to 'atlas'),
allow continuation for sessions where the last agent is 'sisyphus'.
This fixes the issue where boulder continuation was skipped when
Sisyphus took over the conversation after boulder creation.
Replace mock.module with spyOn + mockRestore to prevent fs module
pollution across test files. mock.module replaces the entire module
and caused 69 test failures in other files that depend on fs.
When task_system is enabled, the prompt said 'task tool: BLOCKED' which
LLMs interpreted as blocking task_create/task_update/task_list/task_get
too. Now the constraints section explicitly separates 'task (agent
delegation tool): BLOCKED' from 'task_create, task_update, ...: ALLOWED'
so Junior no longer refuses to use task management tools.
sisyphus-junior prompt always used todo-based discipline text regardless of
experimental.task_system setting because the useTaskSystem flag was never
forwarded from agent-config-handler to createSisyphusJuniorAgentWithOverrides.
Update notification systems to display INTERRUPTED status.
Add interrupt handling to background_output tool (terminal status).
Add interrupt-specific status note to formatTaskStatus.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Change promptAsync catch blocks to set status = "interrupt" instead of "error".
This distinguishes prompt errors from stale timeouts (cancelled) and TTL expirations (error).
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Category-based delegation should always route to sisyphus-junior even if
subagent_type is mistakenly provided, matching the original behavior and
preventing accidental bypass of category routing.
Sync task completion was fragile — detecting premature stability during
brief idle periods between tool calls. Now mirrors opencode's native
SessionPrompt.loop() logic: checks assistant finish reason is terminal
(not tool-calls/unknown) and assistant.id > user.id.
Also switches sync prompt sender from blocking HTTP (promptSync) to
async fire-and-forget (promptAsync) to avoid JSON parse errors in ACP.
session.prompt() may throw {} or JSON parse errors even when the server
successfully processes the request. Instead of crashing the tool, catch
all errors and proceed to fetch messages — if the response is available,
return it; otherwise return a clean error string.
Skill loaders previously only told agents that @path references are
relative to the skill directory, but agents often failed to resolve
them. Now @path/with/slash patterns are automatically expanded to
absolute paths during template construction.
pollForCompletion exited immediately when session went idle before agent
created TODOs or registered children (0 todos + 0 children = vacuously
complete). Add consecutive stability checks (3x500ms debounce) and
currentTool guard to prevent premature exit.
Extract pollForCompletion to dedicated module for testability.
Dev removed task-continuation-enforcer entirely. Remove all remaining
references from plugin hooks, event handler, tool-execute-before, and
config schema to align with origin/dev.
Previously, migrateConfigFile() mutated rawConfig directly. If the file
write failed (e.g. read-only file, permissions), the in-memory config was
already changed to the migrated values, causing the plugin to use migrated
models even though the user's file was untouched. On the next run, the
migration would fire again since _migrations was never persisted.
Now all mutations happen on a structuredClone copy. The original rawConfig
is only updated after the file write succeeds. If the write fails,
rawConfig stays untouched and the function returns false.
- Add improved type definitions for skill resolution
- Enhance executor with better type safety for delegation flows
- Add comprehensive test coverage for delegation tool behavior
- Improve code organization for skill resolver integration
🤖 Generated with assistance of OhMyOpenCode
- Use namespace import for connected-providers-cache for better clarity
- Add explicit type annotation for modelsByProvider to improve type safety
- Update tests to reflect refactored module organization
- Improve code organization while maintaining functionality
🤖 Generated with assistance of OhMyOpenCode
- Deny question prompts in CLI run mode since there's no TUI to answer them
- Inherit parent session permission rules in background task sessions
- Force deny questions while preserving other parent permission settings
- Add test coverage for permission inheritance behavior
🤖 Generated with assistance of OhMyOpenCode
Add permission: [{ permission: 'question', action: 'deny', pattern: '*' }]
to client.session.create() call to prevent background sessions from
asking questions that go unanswered, causing hangs.
- Create available-models.ts for model availability checking
- Create model-selection.ts for category-to-model resolution logic
- Update category-resolver, subagent-resolver, and sync modules to import
from new focused modules instead of monolithic sources
- Change BackgroundManager import to type-only to prevent global process
listener pollution across parallel test files
- Replace real BackgroundManager construction with createMockBackgroundManager
- Fix nested spyOn in config-handler tests to reuse beforeEach spy via
mockResolvedValue instead of re-spying inside test bodies
Migrate imports from monolithic `model-availability` to split modules
(`model-name-matcher`, `available-models-fetcher`, `model-cache-availability`).
Replace XDG_CACHE_HOME env var manipulation with `mock.module` for
`data-path`, ensuring test isolation without polluting process env.
🤖 Generated with assistance of [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
- Add skill-resolver.ts for resolving skill configurations
- Handles skill loading and configuration resolution
- Part of modular delegate-task refactoring effort
🤖 Generated with assistance of OhMyOpenCode
- Export session-storage from claude-tasks/index.ts
- Add CLAUDE_CODE_TASK_LIST_ID fallback support in storage.ts
- Add comprehensive tests for CLAUDE_CODE_TASK_LIST_ID handling
- Prefer ULTRAWORK_TASK_LIST_ID, fall back to CLAUDE_CODE_TASK_LIST_ID
- Both env vars are properly sanitized for path safety
🤖 Generated with assistance of OhMyOpenCode
- Convert index.ts to clean barrel export
- Extract hook implementation to hook.ts
- Extract terminal parsing to parser.ts
- Extract state management to state-manager.ts
- Reduce index.ts from ~276 to ~5 lines
- Follow modular code architecture principles
🤖 Generated with assistance of OhMyOpenCode
- Update main AGENTS.md with current file sizes
- Update complexity hotspot line counts
- Update agent count from 11 to 32 files
- Update CLI utility count to 70
- Update test file count from 100+ to 163
🤖 Generated with assistance of OhMyOpenCode
- Change PROMETHEUS_PERMISSION bash from 'allow' to 'deny' to prevent unrestricted bash execution
- Prometheus is a read-only planner and should not execute bash commands
- The prometheus-md-only hook provides additional blocking as backup
- Extract getMessageDir to message-dir.ts
- Extract executeBackground to background-executor.ts
- Extract session creation logic to session-creator.ts
- Extract polling logic to completion-poller.ts
- Extract message processing to message-processor.ts
- Create sync-executor.ts to orchestrate sync execution
- Add ToolContextWithMetadata type to types.ts
- tools.ts now <200 LOC and focused on tool definition
Port handoff concept from ampcode as a builtin command that extracts
detailed context summary from current session for seamless continuation
in a new session. Enhanced with programmatic context gathering:
- Add HANDOFF_TEMPLATE with phased extraction (gather programmatic
context via session_read/todoread/git, extract context, format, instruct)
- Gather concrete data: session history, todo state, git diff/status
- Include compaction-style sections: USER REQUESTS (AS-IS) verbatim,
EXPLICIT CONSTRAINTS verbatim, plus all original handoff sections
- Register handoff in BuiltinCommandName type and command definitions
- Include session context variables (SESSION_ID, TIMESTAMP, ARGUMENTS)
- Add 14 tests covering registration, template content, programmatic
gathering, compaction-style sections, and emoji-free constraint
Revert .default([]) on load_skills schema back to required, restore the runtime error for missing load_skills, and add explicit load_skills=[] to all task() examples in agent prompts that were missing it.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
When experimental.task_system is enabled, add todowrite: deny and
todoread: deny to per-agent permissions for all primary agents
(sisyphus, hephaestus, atlas, prometheus, sisyphus-junior).
This ensures the model never sees these tools in its tool list,
complementing the existing global tools config and runtime hook.
Replace promptAsync + manual polling loop with promptSyncWithModelSuggestionRetry
(session.prompt) which blocks until the LLM response completes. This matches
OpenCode's native task tool behavior and fixes empty/broken responses that
occurred when polling declared stability prematurely.
Applied to both executeSyncTask and executeSyncContinuation paths.
Remove 'prometheus' from PLAN_AGENT_NAMES so isPlanAgent() no longer
matches prometheus. The only remaining connection is model inheritance
via buildPlanDemoteConfig() in plan-model-inheritance.ts.
- Remove 'prometheus' from PLAN_AGENT_NAMES array
- Update self-delegation error message to say 'plan agent' not 'prometheus'
- Update tests: prometheus is no longer treated as a plan agent
- Update task permission: only plan agents get task tool, not prometheus
Previously, demoted plan agent only received { mode: 'subagent' } with no
model settings, causing fallback to step-3.5-flash. Now inherits all
model-related settings (model, variant, temperature, top_p, maxTokens,
thinking, reasoningEffort, textVerbosity, providerOptions) from the
resolved prometheus config. User overrides via agents.plan.* take priority.
Prompt, permission, description, and color are intentionally NOT inherited.
When Anthropic models reject requests with 'This model does not support
assistant message prefill', detect this as a recoverable error type and
automatically send 'Continue' once to resume the conversation.
Extends session-recovery hook with new 'assistant_prefill_unsupported'
error type. The existing session.error handler in index.ts already sends
'continue' after successful recovery, so no additional logic needed.
Add targeted regression tests for the exact reproduction scenario from issue #1295:
- quick category with sisyphusJuniorModel override (the reported scenario)
- user-defined custom category with sisyphusJuniorModel fallback
The underlying fix was already applied in PRs #1470 and #1556. These tests
ensure the fix does not regress.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The ralph-loop hook calls promptAsync in the implementation, but the
test mock only defined prompt(). Added promptAsync with identical
behavior to make tests pass.
- All 38 ralph-loop tests now pass
- Total test suite: 2361 pass, 3 fail (unrelated to this change)
PR #1620 migrated all prompt calls from session.prompt (blocking) to
session.promptAsync (fire-and-forget HTTP 204). This broke look_at which
needs the multimodal-looker response to be available immediately after
the prompt call returns.
Fix: add promptSyncWithModelSuggestionRetry() that uses session.prompt
(blocking) with model suggestion retry support. look_at now uses this
sync variant while all other callers keep using promptAsync.
- Add promptSyncWithModelSuggestionRetry to model-suggestion-retry.ts
- Switch look_at from promptWithModelSuggestionRetry to sync variant
- Add comprehensive tests for the new sync function
- No changes to other callers (delegate-task, background-agent)
Convert grep, glob, ast-grep, and session-manager tools from static exports to factory functions that receive PluginInput context. This allows them to use ctx.directory instead of process.cwd(), fixing issue #658 where tools search from wrong directory in OpenCode Desktop app.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Addresses cubic review feedback — resolvedPath may contain
non-canonical segments when filePath is absolute, causing
the startsWith check against sisyphusRoot to fail.
Add isContinuationStopped check to atlas hook's session.idle handler
so boulder continuation stops when user runs /stop-continuation.
Previously, todo continuation and session recovery checked the guard,
but boulder continuation did not — causing work to resume after stop.
Fixes#1575
- Uses path.join(ctx.directory, '.sisyphus') + sep as prefix instead of loose .includes()
- Prevents false positive when .sisyphus exists in parent directories outside project root
- Adds test for the false positive case (cubic review feedback)
On Linux systems, 'sg' is a mailutils command, not ast-grep. The previous
fallback would silently run the wrong binary when ast-grep wasn't found.
Changes:
- getSgCliPath() now returns string | null instead of string
- Fallback changed from 'sg' to null
- Call sites now check for null and return user-facing error with
installation instructions
- checkEnvironment() updated to handle null path
Fixes#1365
- Add _migrations field to OhMyOpenCodeConfigSchema to track applied migrations
- Update migrateModelVersions() to accept appliedMigrations Set and return newMigrations array
- Skip migrations that are already in _migrations (preserves user reverts)
- Update migrateConfigFile() to read/write _migrations field
- Add 8 new tests for migration history tracking
Fixes#1570
Remove duplicate consoleErrorSpy declarations in 'command failure' and
'spawn error' tests that shadowed the outer beforeEach/afterEach-managed
spy. The inner declarations created a second spy on the already-spied
console.error, causing restore confusion and potential test leakage.
The 'port with available port starts server' test was calling
createOpencode from the SDK which spawns an actual opencode binary.
CI environments don't have opencode installed, causing ENOENT.
Mock @opencode-ai/sdk and port-utils (same pattern as
server-connection.test.ts) so the test verifies integration
logic without requiring the binary.
Implement all 5 CLI extension options for external orchestration:
- --port <port>: Start server on port, or attach if port occupied
- --attach <url>: Connect to existing opencode server
- --session-id <id>: Resume existing session instead of creating new
- --on-complete <command>: Execute shell command with env vars on completion
- --json: Output structured RunResult JSON to stdout
Refactor runner.ts into focused modules:
- agent-resolver.ts: Agent resolution logic
- server-connection.ts: Server connection management
- session-resolver.ts: Session create/resume with retry
- json-output.ts: Stdout redirect + JSON emission
- on-complete-hook.ts: Shell command execution with env vars
Fixes#1586
Some models (e.g. kimi-k2.5) return tool names with leading spaces
like ' delegate_task', causing tool matching to fail.
Add .trim() in transformToolName() and defensive trim in claude-code-hooks.
Fixes#1568
Add exception in write-existing-file-guard for .sisyphus/*.md files
so Prometheus can rewrite plan files without being blocked by the guard.
The prometheus-md-only hook (which runs later) still validates that only
Prometheus can write to these paths, preserving security.
Fixes#1576
When user explicitly configures an agent model in oh-my-opencode.json,
that model should take priority over the active model in OpenCode's config
(which may just be the system default, not a deliberate UI selection).
This fixes the issue where user-configured models from plugin providers
(e.g., google/antigravity-*) were being overridden by the fallback chain
because config.model was being passed as uiSelectedModel regardless of
whether the user had an explicit config.
The fix:
- Only pass uiSelectedModel when there's no explicit userModel config
- If user has configured a model, let resolveModelPipeline use it directly
Fixes#1573
Co-authored-by: Rishi Vhavle <rishivhavle21@gmail.com>
When user explicitly configures an agent model in oh-my-opencode.json,
that model should take priority over the active model in OpenCode's config
(which may just be the system default, not a deliberate UI selection).
This fixes the issue where user-configured models from plugin providers
(e.g., google/antigravity-*) were being overridden by the fallback chain
because config.model was being passed as uiSelectedModel regardless of
whether the user had an explicit config.
The fix:
- Only pass uiSelectedModel when there's no explicit userModel config
- If user has configured a model, let resolveModelPipeline use it directly
Fixes#1573
- Add "anthropic-effort" to HookNameSchema enum
- Import and create hook in plugin entry with isHookEnabled guard
- Wire chat.params event handler to invoke the effort hook
- First hook to use the chat.params lifecycle event from plugin
Injects `output_config: { effort: "max" }` via AI SDK's providerOptions
when all conditions are met:
- variant is "max" (sisyphus, prometheus, metis, oracle, unspecified-high, ultrawork)
- model matches claude-opus-4[-.]6 pattern
- provider is anthropic, opencode, or github-copilot (with claude model)
Respects existing effort value if already set. Normalizes model IDs
with dots to hyphens for consistent matching.
Fixes#1521. When hook matcher patterns contained regex special characters
like parentheses, the pattern-matcher would throw 'SyntaxError: Invalid
regular expression: unmatched parentheses' because these characters were
not escaped before constructing the RegExp.
The fix escapes all regex special characters (.+?^${}()|[\]\) EXCEPT
the asterisk (*) which is intentionally converted to .* for glob-style
matching.
Add comprehensive test suite for pattern-matcher covering:
- Exact matching (case-insensitive)
- Wildcard matching (glob-style *)
- Pipe-separated patterns
- All regex special characters (parentheses, brackets, etc.)
- Edge cases (empty matcher, complex patterns)
After model version update (opus-4-5 → opus-4-6), several agents had
identical duplicate fallback entries for the same model. The anthropic-only
entry was a superset covered by the broader providers entry, making it dead
code. Consolidate to single entry with all providers.
Cubic found that registering task-continuation-enforcer recovery callbacks
overrode the todo-continuation-enforcer callbacks. Compose the callbacks
so both enforcers receive abort/recovery notifications.
Mirrors todo-continuation-enforcer but reads from file-based task storage
instead of OpenCode's todo API. Includes 19 tests covering all skip
conditions, abort detection, countdown, and recovery scenarios.
Update plan-generation.ts to guide users to run /start-work with plan name.
For example: /start-work fix-bug instead of just /start-work
This makes it clearer which plan the user wants to execute.
Hephaestus now appears when any of its providers (openai, github-copilot, opencode) is
connected, rather than requiring the exact gpt-5.2-codex model. This allows users with
newer codex models (e.g., gpt-5.3-codex) to use Hephaestus without manual config overrides.
- Add requiresProvider field to ModelRequirement type
- Add isAnyProviderConnected() helper in model-availability
- Update hephaestus config from requiresModel to requiresProvider
- Update cli model-fallback to handle requiresProvider checks
Add claude-opus-4-6 as the first anthropic provider entry before
claude-opus-4-5 across all agent and category fallback chains.
Also add high variant mapping for think-mode switcher.
Gemini 3 Pro only supports 'low' and 'high' thinking levels according to
Google's official API documentation. The 'max' variant is not supported
and would result in API errors.
Changed variant: 'max' -> 'high' for gemini-3-pro in:
- oracle agent
- metis agent
- momus agent
- ultrabrain category
- deep category
- artistry category
Ref: https://ai.google.dev/gemini-api/docs/thinking-modeCloses#1433
Bun has unfixed segfault issues on Windows when spawning subprocesses
(oven-sh/bun#25798, #26026, #23043). Even upgrading to Bun v1.3.6+
does not resolve the crashes.
Instead of blocking LSP on Windows with version checks, use Node.js
child_process.spawn as fallback. This allows LSP to work on Windows
regardless of Bun version.
Changes:
- Add UnifiedProcess interface bridging Bun Subprocess and Node ChildProcess
- Use Node.js spawn on Windows, Bun spawn on other platforms
- Add CWD validation before spawn to prevent libuv null dereference
- Add binary existence pre-check on Windows with helpful error messages
- Enable shell: true for Node spawn on Windows for .cmd/.bat resolution
- Remove ineffective Bun version blocking (v1.3.5 check)
- Add tests for CWD validation and start() error handling
Closes#1047
Ref: oven-sh/bun#25798
Category delegation fails when provider-models.json contains model objects
with metadata (id, provider, context, output) instead of plain strings.
Line 196 in model-availability.ts assumes string[] format, causing:
- Object concatenation: `${providerId}/${modelId}` becomes "ollama/[object Object]"
- Empty availableModels Set passed to resolveModelPipeline()
- Error: "Model not configured for category"
This is the root cause of issue #1508 where delegate_task(category='quick')
fails despite direct agent routing (delegate_task(subagent_type='explore'))
working correctly.
Changes:
- model-availability.ts: Add type check to handle both string and object formats
- connected-providers-cache.ts: Update ProviderModelsCache interface to accept both formats
- model-availability.test.ts: Add 4 test cases for object[] format handling
Direct agent routing bypasses fetchAvailableModels() entirely, explaining why
it works while category routing fails. This fix enables category delegation
to work with manually-populated Ollama model caches.
Fixes#1508
When the main session is aborted while background tasks are running,
notifyParentSession() would attempt to call session.messages() and
session.prompt() on the aborted parent session, causing exceptions
that could crash the TUI.
- Add isAbortedSessionError() helper to detect abort-related errors
- Add abort check in session.messages() catch block with early return
- Add abort check in session.prompt() catch block with early return
- Add test case covering aborted parent session scenario
Fixes TUI crash when aborting main session with running background tasks.
- Changed priority order to: opencode-project > opencode > project > user
(OpenCode Global skills now take precedence over legacy Claude project skills)
- Updated JSDoc comments to reflect correct priority order
- Fixed test to use actual discoverSkills() for deduplication verification
- Changed test assertion from 'source' to 'scope' (correct field name)
- Added port-utils module with isPortAvailable, findAvailablePort, getAvailableServerPort
- Modified runner.ts to automatically find available port if preferred port is busy
- Shows warning message when using auto-selected port
- Eliminates need for manual OPENCODE_SERVER_PORT workaround
ToolContext type from @opencode-ai/plugin/tool does not include
a 'directory' property, causing typecheck failure after rebase from dev.
Changed to use process.cwd() which is the same pattern used in
session-manager/tools.ts.
Adds a PreToolUse hook that intercepts write operations and throws an error
if the target file already exists, guiding users to use the edit tool instead.
- Throws error: 'File already exists. Use edit tool instead.'
- Hook is enabled by default, can be disabled via disabled_hooks
- Includes comprehensive test suite with BDD-style comments
The auto-update-checker was operating on the wrong directory:
- CACHE_DIR (~/.cache/opencode) was used for node_modules, package.json, and bun.lock
- But plugins are installed in USER_CONFIG_DIR (~/.config/opencode)
This caused auto-updates to fail silently:
1. Update detected correctly (3.x.x -> 3.y.y)
2. invalidatePackage() tried to delete from ~/.cache/opencode (wrong!)
3. bun install ran but respected existing lockfile
4. Old version remained installed
Fix: Use USER_CONFIG_DIR consistently for all invalidation operations.
Also moves INSTALLED_PACKAGE_JSON constant to use USER_CONFIG_DIR for consistency.
Address review feedback (P3): The User-Installed Skills block was duplicated verbatim in two if/else branches in both buildCategorySkillsDelegationGuide() and Atlas buildSkillsSection(). Extract shared formatCustomSkillsBlock() with configurable header level (#### vs **) so both builders reference a single source of truth.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Atlas had its own buildSkillsSection() in atlas/utils.ts that rendered all skills in a flat table without distinguishing built-in from user-installed. Apply the same HIGH PRIORITY emphasis and CRITICAL warning pattern used in the shared prompt builder.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Custom skills from .config/opencode/skills/ were visible in agent prompts but the model consistently ignored them when delegating via delegate_task(). The flat skill table made no distinction between built-in and user-installed skills, causing the model to default to built-in ones only.
- Separate skills into 'Built-in Skills' and 'User-Installed Skills (HIGH PRIORITY)' sections in buildCategorySkillsDelegationGuide()
- Add CRITICAL warning naming each custom skill explicitly
- Add priority note: 'When in doubt, INCLUDE rather than omit'
- Show source column (user/project) for custom skills
- Apply same separation in buildUltraworkSection()
- Add 10 unit tests covering all skill combination scenarios
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Root cause fix for issue #927:
- After /plan → /start-work → interruption, in-memory sessionAgentMap is cleared
- getAgentFromMessageFiles() returns 'prometheus' (oldest message from /plan)
- But boulder.json has agent: 'atlas' (set by /start-work)
Fix: Check boulder state agent BEFORE falling back to message files
Priority: in-memory → boulder state → message files
Test: 3 new tests covering the priority logic
Address code review: continuation was blocked unless last agent was Atlas,
making the new agent parameter ineffective. Now the idle handler checks if
the last session agent matches boulderState.agent (defaults to 'atlas'),
allowing non-Atlas agents to resume when properly configured.
- Add getLastAgentFromSession helper for agent lookup
- Replace isCallerOrchestrator gate with boulder-agent-aware check
- Add test for non-Atlas agent continuation scenario
Add 'agent' field to BoulderState to track which agent (atlas) should
resume on session continuation. Previously, when user typed 'continue'
after interruption, Prometheus (planner) resumed instead of Sisyphus
(executor), causing all delegate_task calls to get READ-ONLY mode.
Changes:
- Add optional 'agent' field to BoulderState interface
- Update createBoulderState() to accept agent parameter
- Set agent='atlas' when /start-work creates boulder.json
- Use stored agent on boulder continuation (defaults to 'atlas')
- Add tests for new agent field functionality
OmoConfig interface was missing variant property, causing doctor to show
variants from ModelRequirement fallback chain instead of user's config.
- Add variant to OmoConfig agent/category entries
- Add userVariant to resolution info interfaces
- Update getEffectiveVariant to prioritize user variant
- Add tests verifying variant capture
- Make storage_path truly optional (remove default)
- Add task_list_id as config alternative to env var
- Fix build-schema.ts to use zodToJsonSchema
🤖 Generated with assistance of OhMyOpenCode
Remove outdated configuration section links that no longer exist.
Applies changes from PR #1386 (pierrecorsini).
Co-authored-by: Pierre CORSINI <pierrecorsini@users.noreply.github.com>
- Separate directory and file entries, process directories first
- Use Map to deduplicate skills by name (first-wins)
- Directory skills (SKILL.md, {dir}.md) take precedence over file skills (*.md)
- Add test for collision scenario
Addresses Oracle P2 review feedback from PR #1254
When promptWithModelSuggestionRetry() fails, the session was not being aborted, causing the polling loop to wait forever for an idle state. Added session.abort() calls in startTask() and resume() catch blocks.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Hephaestus: #FF4500 (Magma Orange) → #708090 (Slate Gray)
Blacksmith's hammer/iron theme, visible in both light and dark modes
- Prometheus: #9D4EDD (Amethyst Purple) → #FF5722 (Deep Orange)
Fire/flame theme, restoring the original fire color concept
Closes#704
Add support for base64-encoded image data in the look_at tool,
enabling analysis of clipboard/pasted images without requiring
a file path.
Changes:
- Add optional image_data parameter to LookAtArgs type
- Update validateArgs to accept either file_path or image_data
- Add inferMimeTypeFromBase64 function to detect image format
- Add try/catch around atob() to handle invalid base64 gracefully
- Update execute to handle both file path and data URL inputs
- Add comprehensive tests for image_data functionality
call_omo_agent is for lightweight exploration agents (explore, librarian).
metis/momus are consultation agents that should be invoked via delegate_task.
Reverts part of #1462 that incorrectly added metis/momus to call_omo_agent.
- Add metis and momus to AGENT_RESTRICTIONS with same pattern as oracle
- Deny write, edit, task, and delegate_task tools
- Enforces read-only design for these advisor agents
- Addresses cubic review feedback on #1462
* fix(model-requirements): use supported variant for gemini-3-pro
* fix(delegate-task): update artistry variant to high for gemini-3-pro
- Update DEFAULT_CATEGORIES artistry variant from 'max' to 'high'
- Update related test comment
- gemini-3-pro only supports low/high thinking levels, not max
- Addresses Oracle review feedback
* fix(model-availability): prefer exact model ID match in fuzzyMatchModel
* fix(model-availability): use filter+shortest for multi-provider tie-break
- Change Priority 2 from find() to filter()+reduce()
- Preserves shortest-match tie-break when multiple providers share model ID
- Add test for multi-provider same model ID case
- Addresses Oracle review feedback
The shellType was hardcoded to 'unix' which breaks on native Windows shells
(cmd.exe, PowerShell) when running without Git Bash or WSL.
This change uses the existing detectShellType() function to dynamically
determine the correct shell type, enabling proper env var syntax for all
supported shell environments.
createWebsearchConfig was called eagerly before checking disabledMcps,
causing Tavily missing-key error even when websearch was disabled.
Now each MCP is only created if not in disabledMcps list.
Previously tests were tautological - they defined local logic
instead of invoking the actual implementation. Now all tests
properly exercise createWebsearchConfig.
- Add skipNotification option to cancelTask method
- Apply skipNotification to background_cancel tool
- Prevents unwanted notifications when user cancels via tool
- Document TaskCreate, TaskGet, TaskList, TaskUpdate tools
- Note that these tools follow Claude Code internal specs but are not officially documented by Anthropic
- Include schema, dependency system, and usage examples
Plan agent demote now only sets mode to 'subagent' without spreading
prometheus config. This ensures plan agent uses OpenCode's default
prompt instead of inheriting prometheus prompt.
* fix(prometheus): enforce path constraints and atomic write protocol
- Add FORBIDDEN PATHS section blocking docs/, plan/, plans/ directories
- Add SINGLE ATOMIC WRITE protocol to prevent content loss from multiple writes
- Simplify PROMETHEUS_AGENTS array to single PROMETHEUS_AGENT string
* fix: reconcile Edit tool signature in interview-mode.ts with identity-constraints.ts
Identified by cubic: Edit tool usage was inconsistent between files.
- interview-mode.ts showed: Edit(path, content)
- identity-constraints.ts showed: Edit(path, oldString="...", newString="...")
Updated interview-mode.ts to use the correct Edit signature with oldString and newString parameters to match the actual tool API and prevent agent hallucination.
* refactor(background-agent): optimize cache timer lifecycle and result handling
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* refactor(background-task): simplify tool implementation and expand test coverage
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* fix(background-task): fix BackgroundCancel tool parameter handling
Correct parameter names and types in BackgroundCancel tool to match actual usage patterns. Add comprehensive test coverage for parameter validation.
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Restructure prompt with XML tags for better instruction adherence
- Add output_verbosity_spec with concrete limits (≤7 steps, ≤3 sentences)
- Add uncertainty_and_ambiguity section with decision tree
- Add scope_discipline to prevent scope drift
- Add tool_usage_rules for efficient tool calling
- Add high_risk_self_check for architecture/security answers
- Add long_context_handling for large code inputs
- Update context to support session continuation follow-ups
- Convert github-pr-triage to streaming architecture (process PRs one-by-one with immediate reporting)
- Convert github-issue-triage to streaming architecture (process issues one-by-one with real-time updates)
- Add mandatory initial todo registration for both triage skills
- Add phase-by-phase todo status updates
- Generate final comprehensive report at the end
- Show live progress every 5 items during processing
Split monolithic atlas.ts into modular structure: index.ts (routing), default.ts (Claude-optimized), gpt.ts (GPT-optimized), utils.ts (shared utilities). Atlas now routes to appropriate prompt based on model type instead of overriding model settings.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* fix(shared): honor agent variant overrides
* test(shared): use model in fallback chain to verify override precedence
Address PR review: test now uses claude-opus-4-5 (which has default
variant 'max' in sisyphus chain) to properly verify that agent override
'high' takes precedence over the fallback chain's default variant.
Remove duplicate removeCodeBlocks() call in keyword-detector/index.ts.
The detectKeywordsWithType() function already calls removeCodeBlocks() internally, so calling it before passing the text was redundant and caused unnecessary double processing.
- Add syncTaskTodoUpdate function for immediate todo updates
- Integrate with TaskCreate and TaskUpdate tools
- Preserve existing todos when updating single task
- Add comprehensive tests for new sync function
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
In CLI run mode there is no TUI to answer questions, so the Question
tool would hang forever. This sets OPENCODE_CLI_RUN_MODE env var in
runner.ts and config-handler uses it to set question permission to
deny for sisyphus, hephaestus, and prometheus agents.
Add unit tests for resolveRunAgent() covering:
- CLI flag takes priority over env and config
- Env var takes priority over config
- Config takes priority over default
- Falls back to sisyphus when none set
- Skips disabled agent and picks next available core agent
Add resolveRunAgent() to determine agent with priority:
1. CLI --agent flag (highest)
2. OPENCODE_DEFAULT_AGENT environment variable
3. oh-my-opencode.json 'default_run_agent' config
4. 'sisyphus' (fallback)
Features:
- Case-insensitive agent name matching
- Warn and fallback when requested agent is disabled
- Pick next available core agent when default is disabled
Add optional 'default_run_agent' field to OhMyOpenCodeConfig schema.
This field allows users to configure the default agent for the 'run' command
via oh-my-opencode.json configuration file.
- Add Zod schema: z.string().optional()
- Regenerate JSON schema for IDE support
- Add github-pr-triage skill with conservative auto-close logic
- Update github-issue-triage ratio to 7:2:1 (unspecified-low:quick:writing)
- Add gh_fetch.py script for exhaustive GitHub pagination (issues/PRs)
- Script bundled in both skills + available standalone in uvscripts/
- Add task ID pattern validation (T-[A-Za-z0-9-]+) to prevent path traversal
- Refactor lock mechanism to use UUID-based IDs for reliable ownership tracking
- Implement atomic lock creation with stale lock detection and cleanup
- Add lock acquisition checks in create/update/delete handlers
- Expand task-reminder hook to track split tool names and clean up on session deletion
- Add comprehensive test coverage for validation and lock handling
* chore: pin bun-types to 1.3.6
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* chore: exclude test files and script from tsconfig
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* refactor: remove sisyphus-swarm feature
Remove mailbox types and swarm config schema. Update docs.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* refactor: remove legacy sisyphus-tasks feature
Remove old storage and types implementation, replaced by claude-tasks.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(claude-tasks): add task schema and storage utilities
- Task schema with Zod validation (pending, in_progress, completed, deleted)
- Storage utilities: getTaskDir, readJsonSafe, writeJsonAtomic, acquireLock
- Atomic writes with temp file + rename
- File-based locking with 30s stale threshold
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(tools/task): add task object schemas
Add Zod schemas for task CRUD operations input validation.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(tools): add TaskCreate tool
Create new tasks with sequential ID generation and lock-based concurrency.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(tools): add TaskGet tool
Retrieve task by ID with null-safe handling.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(tools): add TaskUpdate tool with claim validation
Update tasks with status transitions and owner claim validation.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(tools): add TaskList tool and exports
- TaskList for summary view of all tasks
- Export all claude-tasks tool factories from index
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(hooks): add task-reminder hook
Remind agents to use task tools after 10 turns without task operations.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(config): add disabled_tools setting and tasks-todowrite-disabler hook
- Add disabled_tools config option to disable specific tools by name
- Register tasks-todowrite-disabler hook name in schema
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(config-handler): add task_* and teammate tool permissions
Grant task_* and teammate permissions to atlas, sisyphus, prometheus, and sisyphus-junior agents.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* feat(delegate-task): add execute option for task execution
Add optional execute field with task_id and task_dir for task-based delegation.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* fix(truncator): add type guard for non-string outputs
Prevent crashes when output is not a string by adding typeof checks.
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* chore: export config types and update task-resume-info
- Export SisyphusConfig and SisyphusTasksConfig types
- Add task_tool to TARGET_TOOLS list
🤖 Generated with [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
* refactor(storage): remove team namespace, use flat task directory
* feat(task): implement unified task tool with all 5 actions
* fix(hooks): update task-reminder to track unified task tool
* refactor(tools): register unified task tool, remove 4 separate tools
* chore(cleanup): remove old 4-tool task implementation
* refactor(config): use new_task_system_enabled as top-level flag
- Add new_task_system_enabled to OhMyOpenCodeConfigSchema
- Remove enabled from SisyphusTasksConfigSchema (keep storage_path, claude_code_compat)
- Update index.ts to gate on new_task_system_enabled
- Update plugin-config.ts default for config initialization
- Update test configs in task.test.ts and storage.test.ts
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* fix: resolve typecheck and test failures
- Add explicit ToolDefinition return type to createTask function
- Fix planDemoteConfig to use 'subagent' mode instead of 'all'
---------
Co-authored-by: justsisyphus <justsisyphus@users.noreply.github.com>
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Unify slot ownership: processKey() owns the slot until task.concurrencyKey is set.
Previously, startTask() released the slot on errors (session.create catch,
createResult.error), but processKey() catch block didn't release, causing slot
leaks when errors occurred between acquire() and task.concurrencyKey assignment.
Changes:
- Remove all pre-transfer release() calls in startTask()
- Add conditional release in processKey() catch: only if task.concurrencyKey not set
- Add validation for createResult.data?.id to catch malformed API responses
This fixes 'Task failed to start within timeout' errors caused by exhausted
concurrency slots that were never released.
GitHub Copilot uses gemini-3-pro-preview and gemini-3-flash-preview as
the official model identifiers. The CLI installer was generating config
with incorrect names (gemini-3-pro, gemini-3-flash).
Reported by user: the install command was creating config with wrong
model names that don't work with GitHub Copilot API.
Remove the 'enabled' flag from babysitting config - the hook now runs
automatically when not disabled via disabled_hooks. This simplifies
configuration and makes the unstable model monitoring a default behavior.
BREAKING CHANGE: babysitting.enabled config option is removed. Use
disabled_hooks: ['unstable-agent-babysitter'] to disable the hook instead.
- Change session title from 'Task: {desc}' to '{desc} (@{agent} subagent)'
- Move session_id to structured <task_metadata> block for better parsing
- Add category tracking to BackgroundTask type and LaunchInput
- Add tests for new title format and metadata block
* refactor(keyword-detector): split constants into domain-specific modules
* feat(shared): add requiresAnyModel and isAnyFallbackModelAvailable
* feat(config): add hephaestus to agent schemas
* feat(agents): add Hephaestus autonomous deep worker
* feat(cli): update model-fallback for hephaestus support
* feat(plugin): add hephaestus to config handler with ordering
* test(delegate-task): update tests for hephaestus agent
* docs: update AGENTS.md files for hephaestus
* docs: add hephaestus to READMEs
* chore: regenerate config schema
* fix(delegate-task): bypass requiresModel check when user provides explicit config
* docs(hephaestus): add 4-part context structure for explore/librarian prompts
* docs: fix review comments from cubic (non-breaking changes)
- Move Hephaestus from Primary Agents to Subagents (uses own fallback chain)
- Fix Hephaestus fallback chain documentation (claude-opus-4-5 → gemini-3-pro)
- Add settings.local.json to claude-code-hooks config sources
- Fix delegate_task parameters in ultrawork prompt (agent→subagent_type, background→run_in_background, add load_skills)
- Update line counts in AGENTS.md (index.ts: 788, manager.ts: 1440)
* docs: fix additional documentation inconsistencies from oracle review
- Fix delegate_task parameters in Background Agents example (docs/features.md)
- Fix Hephaestus fallback chain in root AGENTS.md to match model-requirements.ts
* docs: clarify Hephaestus has no fallback (requires gpt-5.2-codex only)
Hephaestus uses requiresModel constraint - it only activates when gpt-5.2-codex
is available. The fallback chain in code is unreachable, so documentation
should not mention fallbacks.
* fix(hephaestus): remove unreachable fallback chain entries
Hephaestus has requiresModel: gpt-5.2-codex which means the agent only
activates when that specific model is available. The fallback entries
(claude-opus-4-5, gemini-3-pro) were unreachable and misleading.
---------
Co-authored-by: justsisyphus <justsisyphus@users.noreply.github.com>
Replace platform-specific 'which'/'where' commands with cross-platform Bun.which() API to fix Windows compatibility issues and simplify code.
Fixes:
- #1027: Comment-checker binary crashes on Windows (missing 'check' subcommand)
- #1036: Session-notification listens to non-existent events
- #1033: Infinite loop in session notifications
- #599: Doctor incorrectly reports OpenCode as not installed on Windows
- #1005: PowerShell path detection corruption on Windows
Changes:
- Use Bun.which() instead of spawning 'which'/'where' commands
- Add 'check' subcommand to comment-checker invocation
- Remove non-existent event listeners (session.updated, message.created)
- Prevent notification commands from resetting their own state
- Fix edge case: clear notifiedSessions if activity occurs during notification
All changes are cross-platform compatible and tested on Windows/Linux/macOS.
- Extract prompt_append from override and append to prompt instead of shallow spread
- Add test verifying prompt_append is appended, not overwriting base prompt
- Fixes#723
Co-authored-by: Gabriel Ečegi <gabriel-ecegi@users.noreply.github.com>
* fix(tmux): send Ctrl+C before kill-pane and respawn-pane to prevent orphaned processes
* fix(tmux-subagent): prevent premature pane closure with stability detection
Implements stability detection pattern from background-agent to prevent
tmux panes from closing while agents are still working (issue #1330).
Problem: Session status 'idle' doesn't mean 'finished' - agent may still
be thinking/reasoning. Previous code closed panes immediately on idle.
Solution:
- Require MIN_STABILITY_TIME_MS (10s) before stability detection activates
- Track message count changes to detect ongoing activity
- Require STABLE_POLLS_REQUIRED (3) consecutive polls with same message count
- Double-check session status before closing
Changes:
- types.ts: Add lastMessageCount and stableIdlePolls to TrackedSession
- manager.ts: Implement stability detection in pollSessions()
- manager.test.ts: Add 4 tests for stability detection behavior
* test(tmux-subagent): improve stability detection tests to properly verify age gate
- First test now sets session age >10s and verifies 3 polls don't close
- Last test now does 5 polls to prove age gate prevents closure
- Added comments explaining what each poll does
- Add stubNotifyParentSession implementation to stub manager's notifyParentSession method
- Add stubNotifyParentSession calls to checkAndInterruptStaleTasks tests
- Add messages mock to client mocks for completeness
- Fix timer-based tests by using real timers (fakeTimers.restore) with wait()
- Increase timeout for tests that need real time delays
Remove isNonInteractive() check that was incorrectly added in PR #573.
The check prevented env var injection when OpenCode runs in a TTY,
causing git commands like 'git rebase --continue' to open editors (nvim)
that hang forever. The agent cannot interact with spawned bash processes
regardless of whether OpenCode itself is in a TTY.
Add CONTEXT + GOAL + QUESTION + REQUEST structure to agent delegation examples.
This guides users to provide richer context when invoking explore/librarian agents.
- Add thinking_max_chars?: number to BackgroundOutputOptions type
- Add thinking_max_chars argument to background_output tool schema
- Add formatFullSession option for controlling output format
- Add 2 tests for thinking_max_chars functionality
- Use nick-fields/retry@v3 for Build binary step
- 5 minute timeout per attempt
- Max 5 attempts with 10s wait between retries
- Prevents infinite hang on Bun cross-compile network issues
- Remove non-functional batch tool handling (OpenCode has no batch tool)
- Keep working direct tool call path (read/write/edit/multiedit)
- Apply same cleanup to directory-agents-injector and directory-readme-injector
- Add .sisyphus/rules directory support
- Replace setMainSession(undefined) with _resetForTesting() in keyword-detector tests
- Add _resetForTesting() to afterEach hooks for proper cleanup
- Un-skip the previously flaky mainSessionID test in state.test.ts
Fixes#848
Co-authored-by: 배지훈 <new0126@naver.com>
Track setTimeout timers in notifyParentSession using a completionTimers Map.
Clear all timers on shutdown() and when tasks are deleted via session.deleted.
This prevents the BackgroundManager instance from being held in memory by
uncancelled timer callbacks.
Fixes#1043
Co-authored-by: sisyphus-dev-ai <sisyphus-dev-ai@users.noreply.github.com>
* fix: prevent zombie processes with proper process lifecycle management
- Await proc.exited for fire-and-forget spawns in tmux-utils.ts
- Remove competing process.exit() calls from LSP client and skill-mcp-manager
signal handlers to let background-agent manager coordinate final exit
- Await process exit after kill() in interactive-bash timeout handler
- Await process exit after kill() in LSP client stop() method
These changes ensure spawned processes are properly reaped and prevent
orphan/zombie processes when running with tmux integration.
* fix: address Copilot review comments on process cleanup
- LSP cleanup: use async/sync split with Promise.allSettled for proper subprocess cleanup
- LSP stop(): make idempotent by nulling proc before await to prevent race conditions
- Interactive-bash timeout: use .then()/.catch() pattern instead of async callback to avoid unhandled rejections
- Skill-mcp-manager: use void+catch pattern for fire-and-forget signal handlers
* fix: address remaining Copilot review comments
- interactive-bash: reject timeout immediately, fire-and-forget zombie cleanup
- skill-mcp-manager: update comments to accurately describe signal handling strategy
* fix: address additional Copilot review comments
- LSP stop(): add 5s timeout to prevent indefinite hang on stuck processes
- tmux-utils: log warnings when pane title setting fails (both spawn/replace)
- BackgroundManager: delay process.exit() to next tick via setImmediate to allow other signal handlers to complete cleanup
* fix: address code review findings
- Increase exit delay from setImmediate to 100ms setTimeout to allow async cleanup
- Use asyncCleanup for SIGBREAK on Windows for consistency with SIGINT/SIGTERM
- Add try/catch around stderr read in spawnTmuxPane for consistency with replaceTmuxPane
* fix: address latest Copilot review comments
- LSP stop(): properly clear timeout when proc.exited wins the race
- BackgroundManager: use process.exitCode before delayed exit for cleaner shutdown
- spawnTmuxPane: remove redundant log import, reuse existing one
* fix: address latest Copilot review comments
- LSP stop(): escalate to SIGKILL on timeout, add logging
- tmux spawnTmuxPane/replaceTmuxPane: drain stderr immediately to avoid backpressure
* fix: address latest Copilot review comments
- Add .catch() to asyncCleanup() signal handlers to prevent unhandled rejections
- Await proc.exited after SIGKILL with 1s timeout to confirm termination
* fix: increase exit delay to 6s to accommodate LSP cleanup
LSP cleanup can take up to 5s (timeout) + 1s (SIGKILL wait), so the exit
delay must be at least 6s to ensure child processes are properly reaped.
- Clear stop state when user sends new message (chat.message handler)
- Add isContinuationStopped check to session error recovery block
- Continuation resumes automatically after user interaction
- Add critical warnings about using --limit 500 instead of 100
- Add verification checklist before proceeding to Phase 2
- Add severity levels to anti-patterns (CRITICAL/HIGH/MEDIUM)
- Emphasize counting results and fetching additional pages if needed
* fix: resolve deadlock in config handler during plugin initialization
The config handler and createBuiltinAgents were calling fetchAvailableModels
with client, which triggers client.provider.list() API call to OpenCode server.
This caused a deadlock because:
- Plugin initialization waits for server response
- Server waits for plugin init to complete before handling requests
Now using cache-only mode by passing undefined instead of client.
If cache is unavailable, the fallback chain will use the first model.
Fixes#1301
* test: add regression tests for deadlock prevention in fetchAvailableModels
Add tests to ensure fetchAvailableModels is called with undefined client
during plugin initialization. This prevents regression on issue #1301.
- config-handler.test.ts: verify config handler does not pass client
- utils.test.ts: verify createBuiltinAgents does not pass client
* test: restore spies in utils.test.ts to prevent test pollution
Add mockRestore() calls for all spies created in test cases to ensure proper cleanup between tests and prevent state leakage.
* test: restore fetchAvailableModels spy
---------
Co-authored-by: robin <robin@watcha.com>
Co-authored-by: justsisyphus <justsisyphus@users.noreply.github.com>
When cwd equals home directory, ~/.claude/settings.json was being loaded
twice (once as home config and once as cwd config), causing hooks like
Stop to execute twice.
This adds deduplication using Set to ensure each config file is only
loaded once.
- Reduce prompt from 392 to 125 lines
- Add APPROVAL BIAS: approve by default, reject only for blockers
- Limit max 3 issues per rejection to prevent overwhelming feedback
- Remove 'ruthlessly critical' tone, add 'practical reviewer' approach
- Add explicit anti-patterns section for what NOT to reject
- Define 'good enough' criteria (80% clear = pass)
- Update tests to match simplified prompt structure
* fix: exclude prompt/permission from plan agent config
plan agent should only inherit model settings from prometheus,
not the prompt or permission. This ensures plan agent uses
OpenCode's default behavior while only overriding the model.
* test(todo-continuation-enforcer): use FakeTimers for 15x faster tests
- Add custom FakeTimers implementation (~100 lines)
- Replace all real setTimeout waits with fakeTimers.advanceBy()
- Test time: 104.6s → 7.01s
* test(callback-server): fix race conditions with Promise.all and Bun.fetch
- Use Bun.fetch.bind(Bun) to avoid globalThis.fetch mock interference
- Use Promise.all pattern for concurrent fetch/waitForCallback
- Add Bun.sleep(10) in afterEach for port release
* test(concurrency): replace placeholder assertions with getCount checks
Replace 6 meaningless expect(true).toBe(true) assertions with
actual getCount() verifications for test quality improvement
* refactor(config-handler): simplify planDemoteConfig creation
Remove unnecessary IIFE and destructuring, use direct spread instead
* test(executor): use FakeTimeouts for faster tests
- Add custom FakeTimeouts implementation
- Replace setTimeout waits with fakeTimeouts.advanceBy()
- Test time reduced from ~26s to ~6.8s
* test: fix gemini model mock for artistry unstable mode
* test: fix model list mock payload shape
* test: mock provider models for artistry category
---------
Co-authored-by: justsisyphus <justsisyphus@users.noreply.github.com>
- fetchAvailableModels now falls back to client.model.list() when cache is empty
- provider-models cache empty → models.json → client API (3-tier fallback)
- look-at tool explicitly passes registered agent's model to session.prompt
- Ensures multimodal-looker uses correctly resolved model (e.g., gemini-3-flash-preview)
- Add comprehensive tests for fuzzy matching and fallback scenarios
- Add kimiForCoding field to ProviderAvailability interface
- Add kimi-for-coding provider mapping in isProviderAvailable
- Include kimi-for-coding in Sisyphus fallback chain for non-max plan
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Remove uiSelectedModel from Atlas model resolution (use k2p5 as primary)
- Always overwrite provider-models.json on session start to prevent stale cache
- Add oracle vs artistry distinction in MANDATORY CERTAINTY PROTOCOL
- Update WHEN IN DOUBT examples with both delegation options
- Add artistry to IF YOU ENCOUNTER A BLOCKER section
- Add 'Hard problem (non-conventional)' row to AGENTS UTILIZATION table
- Update analyze-mode message with artistry specialist option
Oracle: conventional problems (architecture, debugging, complex logic)
Artistry: non-conventional problems (different approach needed)
- MUST search existing codebase for patterns before writing code
- MUST match project's existing conventions
- MUST write readable, human-friendly code
- Add variant: max to ultrabrain's gemini-3-pro fallback entry
- Rename STRATEGIC_CATEGORY_PROMPT_APPEND to ULTRABRAIN_CATEGORY_PROMPT_APPEND
- Keep original strategic advisor prompt content (no micromanagement instructions)
- Update description: use only for genuinely hard tasks, give clear goals only
- Update tests to match renamed constant
Subagents (explore, librarian, oracle, etc.) now use their own fallback
chain instead of inheriting the UI-selected model. This fixes the issue
where explore agent was incorrectly using Opus instead of Haiku.
- Add AgentMode type and static mode property to AgentFactory
- Each agent declares its own mode via factory.mode = MODE pattern
- createBuiltinAgents() checks source.mode before passing uiSelectedModel
setup-node with registry-url injects NODE_AUTH_TOKEN secret which is revoked.
Create .npmrc manually with empty _authToken to force OIDC authentication.
- Remove registry-url from setup-node (was injecting NODE_AUTH_TOKEN)
- Add npm version check and auto-upgrade for OIDC support (11.5.1+)
- Add explicit --registry flag to npm publish
- Remove empty NODE_AUTH_TOKEN/NPM_CONFIG_USERCONFIG env vars that were breaking OIDC
Add recursive directory scanning to discover skills in nested directories
like superpowers (e.g., skills/superpowers/brainstorming/SKILL.md).
Changes:
- Add namePrefix, depth, and maxDepth parameters to loadSkillsFromDir
- Recurse into subdirectories when no SKILL.md found at current level
- Construct hierarchical skill names (e.g., 'superpowers/brainstorming')
- Limit recursion depth to 2 levels to prevent infinite loops
This enables compatibility with the superpowers plugin which installs
skills as: ~/.config/opencode/skills/superpowers/ -> superpowers/skills/
Fixes skill discovery for nested directory structures.
On Windows with Bun v1.3.5 and earlier, spawning LSP servers causes
a segmentation fault crash. This is a known Bun bug fixed in v1.3.6.
Added version check before LSP server spawn that:
- Detects Windows + affected Bun versions (< 1.3.6)
- Throws helpful error with upgrade instructions instead of crashing
- References the Bun issue for users to track
Closes#1047
Add thinking mode support for Z.AI's GLM-4.7 model via the zai-coding-plan provider.
Changes:
- Add zai-coding-plan to THINKING_CONFIGS with extra_body.thinking config
- Add glm pattern to THINKING_CAPABLE_MODELS
- Add comprehensive tests for GLM thinking mode
GLM-4.7 uses OpenAI-compatible API with extra_body wrapper for thinking:
- thinking.type: 'enabled' or 'disabled'
- thinking.clear_thinking: false (Preserved Thinking mode)
Closes#1030
2026-01-24 16:45:34 +09:00
1542 changed files with 160694 additions and 45635 deletions
- label:I am using the latest version of oh-my-opencode
required:true
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme)
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme) or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer
- label:This feature request is specific to oh-my-opencode (not OpenCode core)
required:true
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme)
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme) or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer
- label:I have searched existing issues and discussions
required:true
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme)
- label:I have read the [documentation](https://github.com/code-yeongyu/oh-my-opencode#readme) or asked an AI coding agent with this project's GitHub URL loaded and couldn't find the answer
required:true
- label:This is a question (not a bug report or feature request)
Thank you for your contribution! Before we can merge this PR, we need you to sign our [Contributor License Agreement (CLA)](https://github.com/code-yeongyu/oh-my-opencode/blob/master/CLA.md).
for platform in darwin-arm64 darwin-x64 linux-x64 linux-arm64 linux-x64-musl linux-arm64-musl windows-x64; do
for platform in darwin-arm64 darwin-x64 darwin-x64-baseline linux-x64 linux-x64-baseline linux-arm64 linux-x64-musl linux-x64-musl-baseline linux-arm64-musl windows-x64 windows-x64-baseline; do
jq --arg v "$VERSION" '.version = $v' "packages/${platform}/package.json" > tmp.json
> You do NOT need to write any of this. It's handled.
>
> **For a patch release**, this is usually sufficient on its own. However, if there are notable bug fixes or changes worth highlighting, an enhanced summary can be added.
> **For a minor/major release**, an enhanced summary is **required** — I'll draft one in the next step.
Wait for the user to acknowledge before proceeding.
</agent-instruction>
---
EOF
## STEP 6: DRAFT ENHANCED RELEASE SUMMARY
# Append existing body EXACTLY as-is (zero modifications)
| **patch** | ASK the user: "Would you like me to draft an enhanced summary highlighting the key bug fixes / changes? Or is the auto-generated changelog sufficient?" If user declines → skip to Step 8. If user accepts → draft a concise bug-fix / change summary below. |
| **minor** | MANDATORY. Draft a concise feature summary. Do NOT proceed without one. |
| **major** | MANDATORY. Draft a full release narrative with migration notes if applicable. Do NOT proceed without one. |
</decision-gate>
### What You're Writing (and What You're NOT)
You are writing the **headline layer** — a product announcement that sits ABOVE the auto-generated commit log. Think "release blog post", not "git log".
<rules>
- NEVER duplicate commit messages. The auto-generated section already lists every commit.
- NEVER write generic filler like "Various bug fixes and improvements" or "Several enhancements".
- ALWAYS focus on USER IMPACT: what can users DO now that they couldn't before?
- ALWAYS group by THEME or CAPABILITY, not by commit type (feat/fix/refactor).
- ALWAYS use concrete language: "You can now do X" not "Added X feature".
</rules>
<examples>
<bad title="Commit regurgitation — DO NOT do this">
## What's New
- feat(auth): add JWT refresh token rotation
- fix(auth): handle expired token edge case
- refactor(auth): extract middleware
</bad>
<good title="User-impact narrative — DO this">
## 🔐 Smarter Authentication
Token refresh is now automatic and seamless. Sessions no longer expire mid-task — the system silently rotates credentials in the background. If you've been frustrated by random logouts, this release fixes that.
</good>
<bad title="Vague filler — DO NOT do this">
## Improvements
- Various performance improvements
- Bug fixes and stability enhancements
</bad>
<good title="Specific and measurable — DO this">
## ⚡ 3x Faster Rule Parsing
Rules are now cached by file modification time. If your project has 50+ rule files, you'll notice startup is noticeably faster — we measured a 3x improvement in our test suite.
</good>
</examples>
### Drafting Process
1.**Analyze** the commit list from Step 5's preview. Identify 2-5 themes that matter to users.
2.**Write** the summary to `/tmp/release-summary-v${NEW_VERSION}.md`.
3.**Present** the draft to the user for review and approval before applying.
**CRITICAL: This is ADDITIVE ONLY. You are adding your notes on top. The existing content remains 100% intact.**
<agent-instruction>
After drafting, ask the user:
> "Here's the release summary I drafted. This will appear AT THE TOP of the release notes, above the auto-generated commit changelog and contributor thanks. Want me to adjust anything before applying?"
Do NOT proceed to Step 7 without user confirmation.
</agent-instruction>
---
## STEP 7: APPLY ENHANCED SUMMARY TO RELEASE
**Skip this step ONLY if the user opted out of the enhanced summary in Step 6** — proceed directly to Step 8.
<architecture>
The final release note structure:
```
┌─────────────────────────────────────┐
│ Enhanced Summary (from Step 6) │ ← You wrote this
│ - Theme-based, user-impact focused │
├─────────────────────────────────────┤
│ --- (separator) │
├─────────────────────────────────────┤
│ Auto-generated Commit Changelog │ ← Workflow wrote this
│ - feat/fix/refactor grouped │
│ - Contributor thank-you messages │
└─────────────────────────────────────┘
```
</architecture>
<zero-content-loss-policy>
- Fetch the existing release body FIRST
- PREPEND your summary above it
- The existing auto-generated content must remain 100% INTACT
- NOT A SINGLE CHARACTER of existing content may be removed or modified
</zero-content-loss-policy>
```bash
# 1. Fetch existing auto-generated body
EXISTING_BODY=$(gh release view "v${NEW_VERSION}" --json body --jq '.body')
# 2. Combine: enhanced summary on top, auto-generated below
@@ -3,337 +3,216 @@ description: Remove unused code from this project with ultrawork mode, LSP-verif
---
<command-instruction>
You are a dead code removal specialist. Execute the FULL dead code removal workflow using ultrawork mode.
Your core weapon: **LSP FindReferences**. If a symbol has ZERO external references, it's dead. Remove it.
Dead code removal via massively parallel deep agents. You are the ORCHESTRATOR — you scan, verify, batch, then delegate ALL removals to parallel agents.
## CRITICAL RULES
<rules>
- **LSP is law.** Verify with `LspFindReferences(includeDeclaration=false)` before ANY removal decision.
- **You do NOT remove code yourself.** You scan, verify, batch, then fire deep agents. They do the work.
</rules>
1.**LSP is law.** Never guess. Always verify with `LspFindReferences` before removing ANYTHING.
2.**One removal = one commit.** Every dead code removal gets its own atomic commit.
3.**Test after every removal.** Run `bun test` after each. If it fails, REVERT and skip.
4.**Leaf-first order.** Remove deepest unused symbols first, then work up the dependency chain. Removing a leaf may expose new dead code upstream.
5.**Never remove entry points.**`src/index.ts`, `src/cli/index.ts`, test files, config files, and files in `packages/` are off-limits unless explicitly targeted.
<false-positive-guards>
NEVER mark as dead:
- Symbols in `src/index.ts` or barrel `index.ts` re-exports
- Symbols referenced in test files (tests are valid consumers)
prompt="Find files in src/ NOT imported by any other file. Check all import statements. EXCLUDE: index.ts, *.test.ts, entry points, .md, packages/. Return: file paths.")
prompt="Find exported functions/types/constants in src/ that are never imported by other files. Cross-reference: for each export, grep the symbol name across src/ — if it only appears in its own file, it's a candidate. EXCLUDE: src/index.ts exports, test files. Return: file path, line, symbol name, export type.")
```
**After removal, also clean up:**
- Remove any imports that were ONLY used by the removed code
- Remove any now-empty import statements
- Fix any trailing whitespace / double blank lines left behind
Files in the same directory CAN be batched together (they won't conflict as long as no two agents edit the same file). Maximize batch count for parallelism.
description: "Read-only GitHub triage for issues AND PRs. 1 item = 1 background task (category: quick). Analyzes all open items and writes evidence-backed reports to /tmp/{datetime}/. Every claim requires a GitHub permalink as proof. NEVER takes any action on GitHub - no comments, no merges, no closes, no labels. Reports only. Triggers: 'triage', 'triage issues', 'triage PRs', 'github triage'."
---
# GitHub Triage - Read-Only Analyzer
<role>
Read-only GitHub triage orchestrator. Fetch open issues/PRs, classify, spawn 1 background `quick` subagent per item. Each subagent analyzes and writes a report file. ZERO GitHub mutations.
description: "Nuclear-grade 16-agent pre-publish release gate. Runs /get-unpublished-changes to detect all changes since last npm release, spawns up to 10 ultrabrain agents for deep per-change analysis, invokes /review-work (5 agents) for holistic review, and 1 oracle for overall release synthesis. Use before EVERY npm publish. Triggers: 'pre-publish review', 'review before publish', 'release review', 'pre-release review', 'ready to publish?', 'can I publish?', 'pre-publish', 'safe to publish', 'publishing review', 'pre-publish check'."
---
# Pre-Publish Review — 16-Agent Release Gate
Three-layer review before publishing to npm. Every layer covers a different angle — together they catch what no single reviewer could.
| Layer | Agents | Type | What They Check |
|-------|--------|------|-----------------|
| Per-Change Deep Dive | up to 10 | ultrabrain | Each logical change group individually — correctness, edge cases, pattern adherence |
{GROUP_COMMITS — hash and message for each commit in this group}
</commits>
<changed_files>
{GROUP_FILES — files changed in this group}
</changed_files>
<diff>
{GROUP_DIFF — only the diff for this group's files}
</diff>
<file_contents>
{Read and include full content of each changed file in this group}
</file_contents>
You are reviewing a specific subset of changes heading into an npm release. Focus exclusively on THIS change group. Other groups are reviewed by parallel agents.
ANALYSIS CHECKLIST:
1. **Intent Clarity**: What is this change trying to do? Is the intent clear from the code and commit messages? If you have to guess, that's a finding.
2. **Correctness**: Trace through the logic for 3+ scenarios. Does the code actually do what it claims? Off-by-one errors, null handling, async edge cases, resource cleanup.
3. **Breaking Changes**: Does this change alter any public API, config format, CLI behavior, or hook contract? If yes, is it backward compatible? Would existing users be surprised?
4. **Pattern Adherence**: Does the new code follow the established patterns visible in the existing file contents? New patterns where old ones exist = finding.
5. **Edge Cases**: What inputs or conditions would break this? Empty arrays, undefined values, concurrent calls, very large inputs, missing config fields.
6. **Error Handling**: Are errors properly caught and propagated? No empty catch blocks? No swallowed promises?
7. **Type Safety**: Any `as any`, `@ts-ignore`, `@ts-expect-error`? Loose typing where strict is possible?
8. **Test Coverage**: Are the behavioral changes covered by tests? Are the tests meaningful or just coverage padding?
9. **Side Effects**: Could this change break something in a different module? Check imports and exports — who depends on what changed?
10. **Release Risk**: On a scale of SAFE / CAUTION / RISKY — how confident are you this change won't cause issues in production?
OUTPUT FORMAT:
<group_name>{GROUP_NAME}</group_name>
<verdict>PASS or FAIL</verdict>
<risk>SAFE / CAUTION / RISKY</risk>
<summary>2-3 sentence assessment of this change group</summary>
<has_breaking_changes>YES or NO</has_breaking_changes>
<breaking_change_details>If YES, describe what breaks and for whom</breaking_change_details>
<findings>
For each finding:
- [CRITICAL/MAJOR/MINOR] Category: Description
- File: path (line range)
- Evidence: specific code reference
- Suggestion: how to fix
</findings>
<blocking_issues>Issues that MUST be fixed before publish. Empty if PASS.</blocking_issues>
""")
```
### Layer 2: Holistic Review via /review-work (5 agents)
Spawn a sub-agent that loads the `/review-work` skill. The review-work skill internally launches 5 parallel agents: Oracle (goal verification), unspecified-high (QA execution), Oracle (code quality), Oracle (security), unspecified-high (context mining). All 5 must pass for the review to pass.
```
task(
category="unspecified-high",
run_in_background=true,
load_skills=["review-work"],
description="Run /review-work on all unpublished changes",
prompt="""
Run /review-work on the unpublished changes between v{PUBLISHED} and HEAD.
GOAL: Review all changes heading into npm publish of oh-my-openagent. These changes span {COMMIT_COUNT} commits across {FILE_COUNT} files.
CONSTRAINTS:
- This is a plugin published to npm — public API stability matters
- TypeScript strict mode, Bun runtime
- No `as any`, `@ts-ignore`, `@ts-expect-error`
- Factory pattern (createXXX) for tools, hooks, agents
- kebab-case files, barrel exports, no catch-all files
BACKGROUND: Pre-publish review of oh-my-openagent, an OpenCode plugin with 1268 TypeScript files, 160k LOC. Changes since v{PUBLISHED} are about to be published.
The diff base is: git diff v{PUBLISHED}..HEAD
Follow the /review-work skill flow exactly — launch all 5 review agents and collect results. Do NOT skip any of the 5 agents.
""")
```
### Layer 3: Oracle Release Synthesis (1 agent)
The oracle gets the full picture — all commits, full diff stat, and changed file list. It provides the final release readiness assessment.
```
task(
subagent_type="oracle",
run_in_background=true,
load_skills=[],
description="Oracle: overall release synthesis and version bump recommendation",
{CHANGED_FILES — full list of modified file paths}
</changed_files>
<full_diff>
{FULL_DIFF — the complete git diff between published version and HEAD}
</full_diff>
<file_contents>
{Read and include full content of KEY changed files — focus on public API surfaces, config schemas, agent definitions, hook registrations, tool registrations}
</file_contents>
You are the final gate before an npm publish. 10 ultrabrain agents are reviewing individual changes and 5 review-work agents are doing holistic review. Your job is the bird's-eye view that those focused reviews might miss.
SYNTHESIS CHECKLIST:
1. **Release Coherence**: Do these changes tell a coherent story? Or is this a grab-bag of unrelated changes that should be split into multiple releases?
2. **Version Bump**: Based on semver:
- PATCH: Bug fixes only, no behavior changes
- MINOR: New features, backward-compatible changes
- MAJOR: Breaking changes to public API, config format, or behavior
Recommend the correct bump with specific justification.
3. **Breaking Changes Audit**: Exhaustively list every change that could break existing users. Check:
"prompt":"I need to add a `max_background_agents` config option to oh-my-opencode that limits how many background agents can run simultaneously. It should be in the plugin config schema with a default of 5. Add validation and make sure the background manager respects it. Create a PR for this.",
"expected_output":"Agent creates worktree, implements config option with schema validation, adds tests, creates PR, iterates through verification gates until merged",
"files":[],
"assertions":[
{"id":"worktree-isolation","text":"Plan uses git worktree in a sibling directory (not main working directory)"},
{"id":"branch-from-dev","text":"Branch is created from origin/dev (not master/main)"},
{"id":"atomic-commits","text":"Plan specifies multiple atomic commits for multi-file changes"},
{"id":"local-validation","text":"Runs bun run typecheck, bun test, and bun run build before pushing"},
{"id":"pr-targets-dev","text":"PR is created targeting dev branch (not master)"},
{"id":"three-gates","text":"Verification loop includes all 3 gates: CI, review-work, and Cubic"},
{"id":"gate-ordering","text":"Gates are checked in order: CI first, then review-work, then Cubic"},
{"id":"cubic-check-method","text":"Cubic check uses gh api to check cubic-dev-ai[bot] reviews for 'No issues found'"},
{"id":"worktree-cleanup","text":"Plan includes worktree cleanup after merge"},
{"id":"real-file-references","text":"Code changes reference actual files in the codebase (config schema, background manager)"}
]
},
{
"id":2,
"prompt":"The atlas hook has a bug where it crashes when boulder.json is missing the worktree_path field. Fix it and land the fix as a PR. Make sure CI passes.",
"expected_output":"Agent creates worktree for the fix branch, adds null check and test for missing worktree_path, creates PR, iterates verification loop",
"files":[],
"assertions":[
{"id":"worktree-isolation","text":"Plan uses git worktree in a sibling directory"},
{"id":"test-added","text":"Test case added for the missing worktree_path scenario"},
{"id":"three-gates","text":"Verification loop includes all 3 gates: CI, review-work, Cubic"},
{"id":"real-atlas-files","text":"References actual atlas hook files in src/hooks/atlas/"},
{"id":"fix-branch-naming","text":"Branch name follows fix/ prefix convention"}
]
},
{
"id":3,
"prompt":"Refactor src/tools/delegate-task/constants.ts to split DEFAULT_CATEGORIES and CATEGORY_MODEL_REQUIREMENTS into separate files. Keep backward compatibility with the barrel export. Make a PR.",
"expected_output":"Agent creates worktree, splits file with atomic commits, ensures imports still work via barrel, creates PR, runs through all gates",
"files":[],
"assertions":[
{"id":"worktree-isolation","text":"Plan uses git worktree in a sibling directory"},
{"id":"multiple-atomic-commits","text":"Uses 2+ commits for the multi-file refactor"},
{"id":"barrel-export","text":"Maintains backward compatibility via barrel re-export in constants.ts or index.ts"},
{"id":"three-gates","text":"Verification loop includes all 3 gates"},
{"id":"real-constants-file","text":"References actual src/tools/delegate-task/constants.ts file and its exports"}
]
},
{
"id":4,
"prompt":"implement issue #100 - we need to add a new built-in MCP for arxiv paper search. just the basic search endpoint, nothing fancy. pr it",
{"id":"worktree-isolation","text":"Plan uses git worktree in a sibling directory"},
{"id":"follows-mcp-pattern","text":"New MCP follows existing pattern from src/mcp/ (websearch, context7, grep_app)"},
{"id":"three-gates","text":"Verification loop includes all 3 gates"},
{"id":"pr-targets-dev","text":"PR targets dev branch"},
{"id":"local-validation","text":"Runs local checks before pushing"}
]
},
{
"id":5,
"prompt":"The comment-checker hook is too aggressive - it's flagging legitimate comments that happen to contain 'Note:' as AI slop. Relax the regex pattern and add test cases for the false positives. Work on a separate branch and make a PR.",
"expected_output":"Agent creates worktree, fixes regex, adds specific test cases for false positive scenarios, creates PR, all three gates pass",
"files":[],
"assertions":[
{"id":"worktree-isolation","text":"Plan uses git worktree in a sibling directory"},
{"id":"real-comment-checker-files","text":"References actual comment-checker hook files in the codebase"},
{"id":"regression-tests","text":"Adds test cases specifically for 'Note:' false positive scenarios"},
{"id":"three-gates","text":"Verification loop includes all 3 gates"},
{"id":"minimal-change","text":"Only modifies regex and adds tests — no unrelated changes"}
{"assertion":"Plan uses git worktree in a sibling directory","reason":"Uses git checkout -b, no worktree isolation"},
{"assertion":"Plan specifies multiple atomic commits for multi-file changes","reason":"Steps listed sequentially but no atomic commit strategy mentioned"},
{"assertion":"Verification loop includes all 3 gates: CI, review-work, and Cubic","reason":"Only mentions CI pipeline in step 6. No review-work or Cubic."},
{"assertion":"Gates are checked in order: CI first, then review-work, then Cubic","reason":"No gate ordering - only CI mentioned"},
{"assertion":"Cubic check uses gh api to check cubic-dev-ai[bot] reviews","reason":"No mention of Cubic at all"},
{"assertion":"Plan includes worktree cleanup after merge","reason":"No worktree used, no cleanup needed"}
]
}
},
{
"eval_name":"bugfix-atlas-null-check",
"with_skill":{
"pass_rate":1.0,
"passed":6,
"total":6,
"duration_seconds":506,
"failed_assertions":[]
},
"without_skill":{
"pass_rate":0.667,
"passed":4,
"total":6,
"duration_seconds":325,
"failed_assertions":[
{"assertion":"Plan uses git worktree in a sibling directory","reason":"No worktree. Steps go directly to creating branch and modifying files."},
{"assertion":"Verification loop includes all 3 gates","reason":"Only mentions CI pipeline (step 5). No review-work or Cubic."}
]
}
},
{
"eval_name":"refactor-split-constants",
"with_skill":{
"pass_rate":1.0,
"passed":5,
"total":5,
"duration_seconds":181,
"failed_assertions":[]
},
"without_skill":{
"pass_rate":0.4,
"passed":2,
"total":5,
"duration_seconds":229,
"failed_assertions":[
{"assertion":"Plan uses git worktree in a sibling directory","reason":"git checkout -b only, no worktree"},
{"assertion":"Uses 2+ commits for the multi-file refactor","reason":"Single atomic commit: 'refactor: split delegate-task constants and category model requirements'"},
{"assertion":"Verification loop includes all 3 gates","reason":"Only mentions typecheck/test/build. No review-work or Cubic."}
]
}
},
{
"eval_name":"new-mcp-arxiv-casual",
"with_skill":{
"pass_rate":1.0,
"passed":5,
"total":5,
"duration_seconds":152,
"failed_assertions":[]
},
"without_skill":{
"pass_rate":0.6,
"passed":3,
"total":5,
"duration_seconds":197,
"failed_assertions":[
{"assertion":"Verification loop includes all 3 gates","reason":"Only mentions bun test/typecheck/build. No review-work or Cubic."}
]
}
},
{
"eval_name":"regex-fix-false-positive",
"with_skill":{
"pass_rate":0.8,
"passed":4,
"total":5,
"duration_seconds":570,
"failed_assertions":[
{"assertion":"Only modifies regex and adds tests — no unrelated changes","reason":"Also proposes config schema change (exclude_patterns) and Go binary update — goes beyond minimal fix"}
]
},
"without_skill":{
"pass_rate":0.6,
"passed":3,
"total":5,
"duration_seconds":399,
"failed_assertions":[
{"assertion":"Plan uses git worktree in a sibling directory","reason":"git checkout -b, no worktree"},
{"assertion":"Verification loop includes all 3 gates","reason":"Only bun test and typecheck. No review-work or Cubic."}
]
}
}
],
"analyst_observations":[
"Three-gates assertion (CI + review-work + Cubic) is the strongest discriminator: 5/5 with-skill vs 0/5 without-skill. Without the skill, agents never know about Cubic or review-work gates.",
"Worktree isolation is nearly as discriminating (5/5 vs 1/5). One without-skill run (eval-4) independently chose worktree, suggesting some agents already know worktree patterns, but the skill makes it consistent.",
"The skill's only failure (eval-5 minimal-change) reveals a potential over-engineering tendency: the skill-guided agent proposed config schema changes and Go binary updates for what should have been a minimal regex fix. Consider adding explicit guidance for fix-type tasks to stay minimal.",
"Duration tradeoff: with-skill is 12% slower on average (340s vs 303s), driven mainly by eval-2 (bugfix) and eval-5 (regex fix) where the skill's thorough verification planning adds overhead. For eval-1 and eval-3-4, with-skill was actually faster.",
"Without-skill duration has lower variance (stddev 78s vs 169s), suggesting the skill introduces more variable execution paths depending on task complexity.",
"Non-discriminating assertions: 'References actual files', 'PR targets dev', 'Runs local checks' — these pass regardless of skill. They validate baseline agent competence, not skill value. Consider removing or downweighting in future iterations.",
"Atomic commits assertion discriminates moderately (2/2 with-skill tested vs 0/2 without-skill tested). Without the skill, agents default to single commits even for multi-file refactors."
- **three-gates** (CI + review-work + Cubic): 5/5 vs 0/5 — strongest signal
- **worktree-isolation**: 5/5 vs 1/5
- **atomic-commits**: 2/2 vs 0/2
- **cubic-check-method**: 1/1 vs 0/1
## Non-Discriminating Assertions
- References actual files: passes in both conditions
- PR targets dev: passes in both conditions
- Runs local checks before pushing: passes in both conditions
## Only With-Skill Failure
- **eval-5 minimal-change**: Skill-guided agent proposed config schema changes and Go binary update for a minimal regex fix. The skill may encourage over-engineering in fix scenarios.
## Analyst Notes
- The skill adds most value for procedural knowledge (verification gates, worktree workflow) that agents cannot infer from codebase alone.
- Duration cost is modest (+12%) and acceptable given the +45% pass rate improvement.
"prompt":"I need to add a `max_background_agents` config option to oh-my-opencode that limits how many background agents can run simultaneously. It should be in the plugin config schema with a default of 5. Add validation and make sure the background manager respects it. Create a PR for this.",
"assertions":[
{
"id":"worktree-isolation",
"text":"Plan uses git worktree in a sibling directory (not main working directory)",
"type":"manual"
},
{
"id":"branch-from-dev",
"text":"Branch is created from origin/dev (not master/main)",
"type":"manual"
},
{
"id":"atomic-commits",
"text":"Plan specifies multiple atomic commits for multi-file changes",
"type":"manual"
},
{
"id":"local-validation",
"text":"Runs bun run typecheck, bun test, and bun run build before pushing",
"type":"manual"
},
{
"id":"pr-targets-dev",
"text":"PR is created targeting dev branch (not master)",
"type":"manual"
},
{
"id":"three-gates",
"text":"Verification loop includes all 3 gates: CI, review-work, and Cubic",
"type":"manual"
},
{
"id":"gate-ordering",
"text":"Gates are checked in order: CI first, then review-work, then Cubic",
"type":"manual"
},
{
"id":"cubic-check-method",
"text":"Cubic check uses gh api to check cubic-dev-ai[bot] reviews for 'No issues found'",
"type":"manual"
},
{
"id":"worktree-cleanup",
"text":"Plan includes worktree cleanup after merge",
"type":"manual"
},
{
"id":"real-file-references",
"text":"Code changes reference actual files in the codebase (config schema, background manager)",
{"text":"Plan uses git worktree in a sibling directory","passed":true,"evidence":"Uses ../omo-wt/feat-max-background-agents"},
{"text":"Branch is created from origin/dev","passed":true,"evidence":"git checkout dev && git pull origin dev, then branch"},
{"text":"Plan specifies multiple atomic commits for multi-file changes","passed":true,"evidence":"2 commits: schema+tests, then concurrency+manager"},
{"text":"Runs bun run typecheck, bun test, and bun run build before pushing","passed":true,"evidence":"Explicit pre-push section with all 3 commands"},
{"text":"PR is created targeting dev branch","passed":true,"evidence":"--base dev in gh pr create"},
{"text":"Verification loop includes all 3 gates: CI, review-work, and Cubic","passed":true,"evidence":"Gate A (CI), Gate B (review-work 5 agents), Gate C (Cubic)"},
{"text":"Gates are checked in order: CI first, then review-work, then Cubic","passed":true,"evidence":"Explicit ordering in verify loop pseudocode"},
{"text":"Cubic check uses gh api to check cubic-dev-ai[bot] reviews","passed":true,"evidence":"Mentions cubic-dev-ai[bot] and 'No issues found' signal"},
{"text":"Plan includes worktree cleanup after merge","passed":true,"evidence":"Phase 4: git worktree remove ../omo-wt/feat-max-background-agents"},
{"text":"Code changes reference actual files in the codebase","passed":true,"evidence":"References src/config/schema/background-task.ts, src/features/background-agent/concurrency.ts, manager.ts"}
**Rationale:** Follows exact same pattern as `maxDepth` and `maxDescendants` — `z.number().int().min(1).optional()`. The field is optional; runtime default of 5 is applied in `ConcurrencyManager`. No barrel export changes needed since `src/config/schema.ts` already does `export * from "./schema/background-task"` and the type is inferred.
`Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents.`
`Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents.`
- `BackgroundManager.launch()` checks `concurrencyManager.canSpawnGlobally()` before creating task
- `BackgroundManager.trackTask()` also checks global limit
- On task completion/cancellation/error, call `releaseGlobal()`
- Throw descriptive error when limit hit: `"Background agent spawn blocked: ${current} agents running, max is ${max}. Wait for existing tasks to complete or increase background_task.maxBackgroundAgents."`
### Local Validation
```bash
bun run typecheck
bun test src/config/schema/background-task.test.ts
bun test src/features/background-agent/concurrency.test.ts
**Title:**`feat: add max_background_agents config to limit concurrent background agents`
**Base:**`dev`
---
## Summary
- Add `maxBackgroundAgents` field to `BackgroundTaskConfigSchema` (default: 5, min: 1) to cap total simultaneous background agents across all models/providers
- Enforce the global limit in `BackgroundManager.launch()` and `trackTask()` with descriptive error messages when the limit is hit
- Release global slots on task completion, cancellation, error, and interrupt to prevent slot leaks
## Motivation
The existing concurrency system in `ConcurrencyManager` limits agents **per model/provider** (e.g., 5 concurrent `anthropic/claude-opus-4-6` tasks). However, there is no **global** cap across all models. A user running tasks across multiple providers could spawn an unbounded number of background agents, exhausting system resources.
`max_background_agents` provides a single knob to limit total concurrent background agents regardless of which model they use.
## Config Usage
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"background_task":{
"maxBackgroundAgents":10// default: 5, min: 1
}
}
```
## Changes
| File | What |
|------|------|
| `src/config/schema/background-task.ts` | Add `maxBackgroundAgents` schema field |
Before every push, run all three checks sequentially:
```bash
bun run typecheck && bun test&& bun run build
```
Specific test files to watch:
```bash
bun test src/config/schema/background-task.test.ts
bun test src/features/background-agent/concurrency.test.ts
```
---
## Gate A: CI (`ci.yml`)
### What CI runs
1.**Tests (split):** mock-heavy tests run in isolation (separate `bun test` processes), rest in batch
2.**Typecheck:**`bun run typecheck` (tsc --noEmit)
3.**Build:**`bun run build` (ESM + declarations + schema)
4.**Schema auto-commit:** if generated schema changed, CI commits it
### How to monitor
```bash
gh pr checks <PR_NUMBER> --watch
```
### Common failure scenarios and fixes
| Failure | Likely Cause | Fix |
|---------|-------------|-----|
| Typecheck error | New field not matching existing type imports | Verify `BackgroundTaskConfig` type is auto-inferred from schema, no manual type updates needed |
| Test failure | Test assertion wrong or missing import | Fix test, re-push |
| Build failure | Import cycle or missing export | Check barrel exports in `src/config/schema.ts` (already re-exports via `export *`) |
| Schema auto-commit | Generated JSON schema changed | Pull the auto-commit, rebase if needed |
### Recovery
```bash
# Read CI logs
gh run view <RUN_ID> --log-failed
# Fix, commit, push
git add -A && git commit -m "fix: address CI failure"&& git push
```
---
## Gate B: review-work (5 parallel agents)
### What it checks
Run `/review-work` which launches 5 background sub-agents:
| Agent | Role | What it checks for this PR |
|-------|------|---------------------------|
| Oracle (goal) | Goal/constraint verification | Does `maxBackgroundAgents` actually limit agents? Is default 5? Is min 1? |
| Oracle (quality) | Code quality | Follows existing patterns? No catch-all files? Under 200 LOC? given/when/then tests? |
| Oracle (security) | Security review | No injection vectors, no unsafe defaults, proper input validation via Zod |
| Hephaestus (context) | Context mining | Checks git history, related issues, ensures no duplicate/conflicting PRs |
### Pass criteria
All 5 agents must pass. Any single failure blocks.
### Common failure scenarios and fixes
| Agent | Likely Issue | Fix |
|-------|-------------|-----|
| Oracle (goal) | Global limit not enforced in all exit paths (completion, cancel, error, interrupt) | Audit every status transition in `manager.ts` that should call `releaseGlobal()` |
| Oracle (quality) | Test style not matching given/when/then | Restructure tests with `#given`/`#when`/`#then` describe nesting |
| Oracle (quality) | File exceeds 200 LOC | `concurrency.ts` is 137 LOC + ~25 new = ~162 LOC, safe. `manager.ts` is already large but we're adding ~20 lines to existing methods, not creating new responsibility |
| Oracle (security) | Integer overflow or negative values | Zod `.int().min(1)` handles this at config parse time |
| Hephaestus (QA) | Test actually fails when run | Run tests locally first, fix before push |
Cubic is an automated code review bot that analyzes the PR diff. It must respond with "No issues found" for the gate to pass.
### Common failure scenarios and fixes
| Issue | Likely Cause | Fix |
|-------|-------------|-----|
| "Missing error handling" | `releaseGlobal()` not called in some error path | Add `releaseGlobal()` to the missed path |
| "Inconsistent naming" | Field name doesn't match convention | Use `maxBackgroundAgents` (camelCase in schema, `max_background_agents` in JSONC config) |
| "Missing documentation" | No JSDoc on new public methods | Add JSDoc comments to `canSpawnGlobally()`, `acquireGlobal()`, `releaseGlobal()`, `getMaxBackgroundAgents()` |
| "Test coverage gap" | Missing edge case test | Add the specific test case Cubic identifies |
### Recovery
```bash
# Read Cubic's review
gh api repos/code-yeongyu/oh-my-openagent/pulls/<PR_NUMBER>/reviews
No iteration cap. Loop continues until all three gates pass simultaneously in a single iteration.
---
## Risk Assessment
| Risk | Probability | Mitigation |
|------|------------|------------|
| Slot leak (global count never decremented) | Medium | Audit every exit path: `tryCompleteTask`, `cancelTask`, `handleEvent(session.error)`, `startTask` prompt error, `resume` prompt error |
| Race condition on global count | Low | `globalRunningCount` is synchronous (single-threaded JS), no async gap between check and increment in `launch()` |
| Breaking existing behavior | Low | Default is 5, same as existing per-model default. Users with <5 total agents see no change |
| `manager.ts` exceeding 200 LOC | Already exceeded | File is already ~1500 LOC (exempt due to being a core orchestration class with many methods). Our changes add ~20 lines to existing methods, not a new responsibility |
{"text":"Plan uses git worktree in a sibling directory","passed":false,"evidence":"Uses git checkout -b, no worktree isolation"},
{"text":"Branch is created from origin/dev","passed":true,"evidence":"git checkout -b feat/max-background-agents dev"},
{"text":"Plan specifies multiple atomic commits for multi-file changes","passed":false,"evidence":"Steps listed sequentially but no atomic commit strategy mentioned"},
{"text":"Runs bun run typecheck, bun test, and bun run build before pushing","passed":true,"evidence":"Step 6 runs typecheck and tests, Step 8 implies push after verification"},
{"text":"PR is created targeting dev branch","passed":true,"evidence":"Step 8 mentions creating PR"},
{"text":"Verification loop includes all 3 gates: CI, review-work, and Cubic","passed":false,"evidence":"Only mentions CI pipeline in step 6. No review-work or Cubic."},
{"text":"Gates are checked in order: CI first, then review-work, then Cubic","passed":false,"evidence":"No gate ordering - only CI mentioned"},
{"text":"Cubic check uses gh api to check cubic-dev-ai[bot] reviews","passed":false,"evidence":"No mention of Cubic at all"},
{"text":"Plan includes worktree cleanup after merge","passed":false,"evidence":"No worktree used, no cleanup needed"},
{"text":"Code changes reference actual files in the codebase","passed":true,"evidence":"References actual files with detailed design decisions"}
/** Maximum number of background agents that can run simultaneously across all models/providers (default: no global limit, only per-model limits apply) */
**What changed:** Added `maxBackgroundAgents` field after `maxDescendants` (grouped with other limit fields). Uses `z.number().int().min(1).optional()` matching the pattern of `maxDepth` and `maxDescendants`.
* Get current count for a model (for testing/debugging)
*/
getCount(model: string):number{
returnthis.counts.get(model)??0
}
/**
* Get queue length for a model (for testing/debugging)
*/
getQueueLength(model: string):number{
returnthis.queues.get(model)?.length??0
}
/**
* Get current global count across all models (for testing/debugging)
*/
getGlobalCount():number{
returnthis.globalCount
}
/**
* Get global queue length (for testing/debugging)
*/
getGlobalQueueLength():number{
returnthis.globalQueue.length
}
}
```
**What changed:**
- Added `globalCount` field to track total active agents across all keys
- Added `globalQueue` for tasks waiting on the global limit
- Added `getGlobalLimit()` method to read `maxBackgroundAgents` from config
- Modified `acquire()` to check both per-model AND global limits
- Modified `release()` to handle global queue handoff and decrement global count
- Modified `clear()` to reset global state
- Added `getGlobalCount()` and `getGlobalQueueLength()` for testing
**Important design note:** The `release()` implementation above is a simplified version. In practice, the global queue handoff is tricky because we need to know which model the global waiter was trying to acquire for. A cleaner approach would be to store the model key in the QueueEntry. Let me refine:
### Refined approach (simpler, more correct)
Instead of a separate global queue, a simpler approach is to check the global limit inside `acquire()` and use a single queue per model. When global capacity frees up on `release()`, we try to drain any model's queue:
This refined approach keeps all waiters in per-model queues (no separate global queue), and on release, tries to drain waiters from any model queue that was blocked by the global limit.
Add a `max_background_agents` config option to oh-my-openagent that limits total simultaneous background agents across all models/providers. Currently, concurrency is only limited per-model/provider key (default 5 per key). This new option adds a **global ceiling** on total running background agents.
## Step-by-Step Plan
### Step 1: Create feature branch
```bash
git checkout -b feat/max-background-agents dev
```
### Step 2: Add `max_background_agents` to BackgroundTaskConfigSchema
**File:**`src/config/schema/background-task.ts`
- Add `maxBackgroundAgents` field to the Zod schema with `z.number().int().min(1).optional()`
- This follows the existing pattern of `maxDepth` and `maxDescendants` (integer, min 1, optional)
- The field name uses camelCase to match existing schema fields (`defaultConcurrency`, `maxDepth`, `maxDescendants`)
- No `.default()` needed since the hardcoded fallback of 5 lives in `ConcurrencyManager`
### Step 3: Modify `ConcurrencyManager` to enforce global limit
| `src/features/background-agent/concurrency.test.ts` | Add global limit enforcement tests |
## Files NOT Modified (intentional)
| File | Reason |
|------|--------|
| `src/config/schema/oh-my-openagent-config.ts` | No change needed - `BackgroundTaskConfigSchema` is already composed into root schema via `background_task` field |
| `src/create-managers.ts` | No change needed - `pluginConfig.background_task` already passed to `BackgroundManager` constructor |
| `src/features/background-agent/manager.ts` | No change needed - already passes config to `ConcurrencyManager` |
| `src/plugin-config.ts` | No change needed - `background_task` is a simple object field, uses default override merge |
1.**Field name `maxBackgroundAgents`** - camelCase to match existing schema fields (`maxDepth`, `maxDescendants`, `defaultConcurrency`). The user-facing JSONC config key is also camelCase per existing convention in `background_task` section.
2.**Global limit vs per-model limit** - The global limit is a ceiling across ALL concurrency keys. Per-model limits still apply independently. A task needs both a per-model slot AND a global slot to proceed.
3.**Default of 5** - Matches the existing hardcoded default in `getConcurrencyLimit()`. When `maxBackgroundAgents` is not set, no global limit is enforced (only per-model limits apply), preserving backward compatibility.
4.**Queue behavior** - When global limit is reached, tasks wait in the same FIFO queue mechanism. The global check happens inside `acquire()` before the per-model check.
5.**0 means Infinity** - Following the existing pattern where `defaultConcurrency: 0` means unlimited, `maxBackgroundAgents: 0` would also mean no global limit.
**Title:** feat: add `maxBackgroundAgents` config to limit total simultaneous background agents
**Body:**
## Summary
- Add `maxBackgroundAgents` field to `BackgroundTaskConfigSchema` that enforces a global ceiling on total running background agents across all models/providers
- Modify `ConcurrencyManager` to track global count and enforce the limit alongside existing per-model limits
- Add schema validation tests and concurrency enforcement tests
## Motivation
Currently, concurrency is only limited per model/provider key (default 5 per key). On resource-constrained machines or when using many different models, the total number of background agents can grow unbounded (5 per model x N models). This config option lets users set a hard ceiling.
- Added `globalCount` tracking total active agents across all concurrency keys
- Added `getGlobalLimit()` reading `maxBackgroundAgents` from config (defaults to `Infinity` = no global limit)
- Modified `acquire()` to check both per-model AND global capacity
- Modified `release()` to decrement global count and drain cross-model waiters blocked by global limit
- Modified `clear()` to reset global state
- Added `getGlobalCount()` / `getGlobalQueueLength()` for testing
### Tests
-`src/config/schema/background-task.test.ts`: 6 test cases for schema validation (valid, min boundary, below min, negative, non-integer, undefined)
-`src/features/background-agent/concurrency.test.ts`: 8 test cases for global limit enforcement (cross-model blocking, release unblocking, per-model vs global interaction, no-config default, clear reset)
## Config Example
```jsonc
{
"background_task":{
"maxBackgroundAgents":5,
"defaultConcurrency":3
}
}
```
## Backward Compatibility
- When `maxBackgroundAgents` is not set (default), no global limit is enforced - behavior is identical to before
- Existing `defaultConcurrency`, `providerConcurrency`, and `modelConcurrency` continue to work unchanged
| No global limit = no enforcement | No config, acquire 6 different models | All succeed |
| Clear resets global count | Acquire 2, clear | `getGlobalCount()` is 0 |
### Existing Test Regression
```bash
bun test src/features/background-agent/concurrency.test.ts
bun test src/config/schema/background-task.test.ts
bun test src/config/schema.test.ts
```
All existing tests must continue to pass unchanged.
## 3. Integration Verification
### Config Loading Path
Verify the config flows correctly through the system:
1.**Schema → Type**: `BackgroundTaskConfig` type auto-includes `maxBackgroundAgents` via `z.infer`
2.**Config file → Schema**: `loadConfigFromPath()` in `plugin-config.ts` uses `OhMyOpenAgentConfigSchema.safeParse()` which includes `BackgroundTaskConfigSchema`
3.**Config → Manager**: `create-managers.ts` passes `pluginConfig.background_task` to `BackgroundManager` constructor
4.**Manager → ConcurrencyManager**: `BackgroundManager` constructor passes config to `new ConcurrencyManager(config)`
5.**ConcurrencyManager → Enforcement**: `acquire()` reads `config.maxBackgroundAgents` via `getGlobalLimit()`
No changes needed in steps 2-4 since the field is optional and the existing plumbing passes the entire `BackgroundTaskConfig` object.
### Manual Config Test
Create a test config to verify parsing:
```bash
echo'{ "background_task": { "maxBackgroundAgents": 3 } }'| bun -e "
"prompt":"The atlas hook has a bug where it crashes when boulder.json is missing the worktree_path field. Fix it and land the fix as a PR. Make sure CI passes.",
"assertions":[
{
"id":"worktree-isolation",
"text":"Plan uses git worktree in a sibling directory",
**Rationale**: `readBoulderState` casts raw `JSON.parse()` output as `BoulderState` without validating individual fields. When boulder.json has `"worktree_path": null` (valid JSON from manual edits, corrupted state, or external tools), the runtime type is `null` but TypeScript type says `string | undefined`. This sanitization ensures downstream code always gets the correct type.
---
## File 2: `src/hooks/atlas/idle-event.ts`
**Change**: Add defensive string type guard before passing `worktree_path` to continuation functions.
**Rationale**: Belt-and-suspenders defense. Even though `readBoulderState` now sanitizes, direct `writeBoulderState` calls elsewhere could still produce invalid state. The `typeof` check is zero-cost and prevents any possibility of `null` or non-string values leaking through.
---
## File 3: `src/hooks/atlas/index.test.ts`
**Change**: Add test cases for missing `worktree_path` scenarios within the existing `session.idle handler` describe block.
```typescript
test("should inject continuation when boulder.json has no worktree_path field",async()=>{
fix(atlas): prevent crash when boulder.json missing worktree_path
```
# PR Body
## Summary
- Fix runtime type violation in atlas hook when `boulder.json` lacks `worktree_path` field
- Add `worktree_path` sanitization in `readBoulderState()` to reject non-string values (e.g., `null` from manual edits)
- Add defensive `typeof` guards in `idle-event.ts` before passing worktree path to continuation injection
- Add test coverage for missing and null `worktree_path` scenarios
## Problem
`readBoulderState()` in `src/features/boulder-state/storage.ts` casts raw `JSON.parse()` output directly as `BoulderState` via `return parsed as BoulderState`. This bypasses TypeScript's type system entirely at runtime.
When `boulder.json` is missing the `worktree_path` field (common for boulders created before worktree support was added, or created without `--worktree` flag), `boulderState.worktree_path` is `undefined` which is handled correctly. However, when boulder.json has `"worktree_path": null` (possible from manual edits, external tooling, or corrupted state), the runtime type becomes `null` which violates the TypeScript type `string | undefined`.
2.`idle-event.ts:scheduleRetry()` callback → same chain
While the `boulder-continuation-injector.ts` handles falsy values via `worktreePath ? ... : ""`, the type mismatch can cause subtle downstream issues and violates the contract of the `BoulderState` interface.
{"text":"Plan uses git worktree in a sibling directory","passed":false,"evidence":"No worktree. Steps go directly to creating branch and modifying files."},
{"text":"Fix is minimal — adds null check, doesn't refactor unrelated code","passed":true,"evidence":"Focused fix though also adds try/catch in setTimeout (reasonable secondary fix)"},
{"text":"Test case added for the missing worktree_path scenario","passed":true,"evidence":"Detailed test plan for missing/null/malformed boulder.json"},
{"text":"Verification loop includes all 3 gates","passed":false,"evidence":"Only mentions CI pipeline (step 5). No review-work or Cubic."},
{"text":"References actual atlas hook files","passed":true,"evidence":"References idle-event.ts, storage.ts with line numbers"},
{"text":"Branch name follows fix/ prefix convention","passed":true,"evidence":"fix/atlas-hook-missing-worktree-path"}
**Rationale:** Validates that required fields (`active_plan`, `plan_name`) are strings. Strips `worktree_path` if it's present but not a string (e.g., `null`, number). This prevents downstream crashes from `existsSync(undefined)` and ensures type safety at the boundary.
---
## Change 2: Add try/catch in setTimeout retry callback
**Rationale:** The async callback in setTimeout creates a floating promise. Without try/catch, any error becomes an unhandled rejection that can crash the process. This is the critical safety net even after the `readBoulderState` fix.
**Rationale:** Defense-in-depth. Even though `readBoulderState` now validates `active_plan`, the `getPlanProgress` function is a public API that could be called from other paths with invalid input. A `typeof` check before `existsSync` prevents the TypeError from `existsSync(undefined)`.
returnparsedasBoulderState// <-- unsafe cast, no field validation
```
It validates `session_ids` but NOT `active_plan`, `plan_name`, or `worktree_path`. This means a malformed `boulder.json` (e.g., `{}` or missing key fields) passes through and downstream code crashes.
### Crash Path
1.`boulder.json` is written without required fields (manual edit, corruption, partial write)
2.`readBoulderState()` returns it as `BoulderState` with `active_plan: undefined`
3. Multiple call sites pass `boulderState.active_plan` to `getPlanProgress(planPath: string)`:
4.`getPlanProgress()` calls `existsSync(undefined)` which throws: `TypeError: The "path" argument must be of type string`
### worktree_path-Specific Issues
When `worktree_path` field is missing from `boulder.json`:
- The `idle-event.ts``scheduleRetry` setTimeout callback (lines 62-88) has NO try/catch. An unhandled promise rejection from the async callback crashes the process.
-`readBoulderState()` returns `worktree_path: undefined` which itself is handled in `boulder-continuation-injector.ts` (line 42 uses truthiness check), but the surrounding code in the setTimeout lacks error protection.
### Secondary Issue: Unhandled Promise in setTimeout
- Fix crash in atlas hook when `boulder.json` is missing `worktree_path` (or other required fields) by hardening `readBoulderState()` validation
- Wrap the unprotected `setTimeout` retry callback in `idle-event.ts` with try/catch to prevent unhandled promise rejections
- Add defensive type guard in `getPlanProgress()` to prevent `existsSync(undefined)` TypeError
## Context
When `boulder.json` is malformed or manually edited to omit fields, `readBoulderState()` returns an object cast as `BoulderState` without validating required fields. Downstream callers like `getPlanProgress(boulderState.active_plan)` then pass `undefined` to `existsSync()`, which throws a TypeError. This crash is especially dangerous in the `setTimeout` retry callback in `idle-event.ts`, where the error becomes an unhandled promise rejection.
## Changes
### `src/features/boulder-state/storage.ts`
-`readBoulderState()`: Validate `active_plan` and `plan_name` are strings (return `null` if not)
-`readBoulderState()`: Strip `worktree_path` if present but not a string type
-`getPlanProgress()`: Add `typeof planPath !== "string"` guard before `existsSync`
### `src/hooks/atlas/idle-event.ts`
- Wrap `scheduleRetry` setTimeout async callback body in try/catch
### Tests
-`src/features/boulder-state/storage.test.ts`: 5 new tests for missing/malformed fields
-`src/hooks/atlas/index.test.ts`: 2 new tests for worktree_path presence/absence in continuation prompt
"prompt":"Refactor src/tools/delegate-task/constants.ts to split DEFAULT_CATEGORIES and CATEGORY_MODEL_REQUIREMENTS into separate files. Keep backward compatibility with the barrel export. Make a PR.",
"assertions":[
{
"id":"worktree-isolation",
"text":"Plan uses git worktree in a sibling directory",
"type":"manual"
},
{
"id":"multiple-atomic-commits",
"text":"Uses 2+ commits for the multi-file refactor",
"type":"manual"
},
{
"id":"barrel-export",
"text":"Maintains backward compatibility via barrel re-export in constants.ts or index.ts",
"type":"manual"
},
{
"id":"three-gates",
"text":"Verification loop includes all 3 gates",
"type":"manual"
},
{
"id":"real-constants-file",
"text":"References actual src/tools/delegate-task/constants.ts file and its exports",
{"text":"Plan uses git worktree in a sibling directory","passed":true,"evidence":"../omo-wt/refactor-delegate-task-constants"},
{"text":"Uses 2+ commits for the multi-file refactor","passed":true,"evidence":"Commit 1: category defaults+appends, Commit 2: plan agent prompt+names"},
{"text":"Maintains backward compatibility via barrel re-export","passed":true,"evidence":"constants.ts converted to re-export from 4 new files, full import map verified"},
{"text":"Verification loop includes all 3 gates","passed":true,"evidence":"Gate A (CI), Gate B (review-work), Gate C (Cubic)"},
{"text":"References actual src/tools/delegate-task/constants.ts","passed":true,"evidence":"654 lines analyzed, 4 responsibilities identified, full external+internal import map"}
refactor(delegate-task): split constants.ts into focused modules
```
# PR Body
## Summary
- Split the 654-line `src/tools/delegate-task/constants.ts` into 4 single-responsibility modules: `default-categories.ts`, `category-prompt-appends.ts`, `plan-agent-prompt.ts`, `plan-agent-names.ts`
-`constants.ts` becomes a pure re-export barrel, preserving all existing import paths (`from "./constants"` and `from "./delegate-task"`)
- Zero import changes across the codebase (6 external + 7 internal consumers verified)
## Motivation
`constants.ts` at 654 lines violates the project's 200 LOC soft limit (`modular-code-enforcement.md` rule) and bundles 4 unrelated responsibilities: category model configs, category prompt text, plan agent prompts, and plan agent name utilities.
All 13 consumers continue importing from `"./constants"` or `"../tools/delegate-task/constants"` with zero changes. The re-export chain: new modules -> `constants.ts` -> `index.ts` -> external consumers.
## Note on CATEGORY_MODEL_REQUIREMENTS
`CATEGORY_MODEL_REQUIREMENTS` already lives in `src/shared/model-requirements.ts`. No move needed. The AGENTS.md reference to it being in `constants.ts` is outdated.
## Testing
-`bun run typecheck` passes
-`bun test src/tools/delegate-task/` passes (all existing tests untouched)
{"text":"Plan uses git worktree in a sibling directory","passed":false,"evidence":"git checkout -b only, no worktree"},
{"text":"Uses 2+ commits for the multi-file refactor","passed":false,"evidence":"Single atomic commit: 'refactor: split delegate-task constants and category model requirements'"},
{"text":"Maintains backward compatibility via barrel re-export","passed":true,"evidence":"Re-exports from new files, zero consumer changes"},
{"text":"Verification loop includes all 3 gates","passed":false,"evidence":"Only mentions typecheck/test/build. No review-work or Cubic."},
{"text":"References actual src/tools/delegate-task/constants.ts","passed":true,"evidence":"654 lines, detailed responsibility breakdown, full import maps"}
> Note: Each `*_CATEGORY_PROMPT_APPEND` contains the full template string from the original. Abbreviated with `...` here for readability. The actual code would contain the complete unmodified prompt text.
`src/tools/delegate-task/constants.ts` is **654 lines** with 6 distinct responsibilities. Violates the 200 LOC modular-code-enforcement rule. `CATEGORY_MODEL_REQUIREMENTS` is actually in `src/shared/model-requirements.ts` (311 lines, also violating 200 LOC), not in `constants.ts`.
## Pre-Flight Analysis
### Current `constants.ts` responsibilities:
1.**Category prompt appends** (8 template strings, ~274 LOC prompt text)
- Extract `CATEGORY_MODEL_REQUIREMENTS` from `src/shared/model-requirements.ts` (311 LOC) into `category-model-requirements.ts`, bringing both files under the 200 LOC limit
- Convert original files to barrel re-exports for 100% backward compatibility (zero consumer changes)
## Motivation
Both files violate the project's 200 LOC modular-code-enforcement rule. `constants.ts` mixed 6 unrelated responsibilities (category configs, prompt templates, plan agent builders, identity utils). `model-requirements.ts` mixed agent and category model requirements.
## Changes
### `src/tools/delegate-task/`
| New File | Responsibility |
|----------|---------------|
| `default-categories.ts` | `DEFAULT_CATEGORIES` record |
| `category-descriptions.ts` | `CATEGORY_DESCRIPTIONS` record |
| `plan-agent-prompt.ts` | Plan agent system prompts + builder functions |
| `plan-agent-identity.ts` | `isPlanAgent`, `isPlanFamily` + name lists |
`constants.ts` is now a barrel re-export file (~25 LOC).
### `src/shared/`
| New File | Responsibility |
|----------|---------------|
| `category-model-requirements.ts` | `CATEGORY_MODEL_REQUIREMENTS` record |
`model-requirements.ts` retains types + `AGENT_MODEL_REQUIREMENTS` and re-exports `CATEGORY_MODEL_REQUIREMENTS`.
## Backward Compatibility
All existing import paths (`from "./constants"`, `from "../../tools/delegate-task/constants"`, `from "../../shared/model-requirements"`) continue to work unchanged. Zero consumer files modified.
## Testing
-`bun run typecheck` passes
-`bun test` passes (existing `tools.test.ts` validates all re-exported symbols)
Expected: 0 errors. This confirms all 14 consumer files (8 internal + 6 external) resolve their imports correctly through the barrel re-exports.
## 2. Behavioral Regression
### 2a. Existing test suite
```bash
bun test src/tools/delegate-task/tools.test.ts
```
This test file imports `DEFAULT_CATEGORIES`, `CATEGORY_PROMPT_APPENDS`, `CATEGORY_DESCRIPTIONS`, `isPlanAgent`, `PLAN_AGENT_NAMES`, `isPlanFamily`, `PLAN_FAMILY_NAMES` from `./constants`. If the barrel re-export is correct, all these tests pass unchanged.
### 2b. Category resolver tests
```bash
bun test src/tools/delegate-task/category-resolver.test.ts
```
This exercises `resolveCategoryConfig()` which imports `DEFAULT_CATEGORIES` and `CATEGORY_PROMPT_APPENDS` from `./constants` and `CATEGORY_MODEL_REQUIREMENTS` from `../../shared/model-requirements`.
### 2c. Model selection tests
```bash
bun test src/tools/delegate-task/model-selection.test.ts
```
### 2d. Merge categories tests
```bash
bun test src/shared/merge-categories.test.ts
```
Imports `DEFAULT_CATEGORIES` from `../tools/delegate-task/constants` (external path).
### 2e. Full test suite
```bash
bun test
```
## 3. Build Verification
```bash
bun run build
```
Confirms ESM bundle + declarations emit correctly with the new file structure.
## 4. Export Completeness Verification
### 4a. Verify `constants.ts` re-exports match original exports
Cross-check that every symbol previously exported from `constants.ts` is still exported. The original file exported these symbols:
-`VISUAL_CATEGORY_PROMPT_APPEND`
-`ULTRABRAIN_CATEGORY_PROMPT_APPEND`
-`ARTISTRY_CATEGORY_PROMPT_APPEND`
-`QUICK_CATEGORY_PROMPT_APPEND`
-`UNSPECIFIED_LOW_CATEGORY_PROMPT_APPEND`
-`UNSPECIFIED_HIGH_CATEGORY_PROMPT_APPEND`
-`WRITING_CATEGORY_PROMPT_APPEND`
-`DEEP_CATEGORY_PROMPT_APPEND`
-`DEFAULT_CATEGORIES`
-`CATEGORY_PROMPT_APPENDS`
-`CATEGORY_DESCRIPTIONS`
-`PLAN_AGENT_SYSTEM_PREPEND_STATIC_BEFORE_SKILLS`
-`PLAN_AGENT_SYSTEM_PREPEND_STATIC_AFTER_SKILLS`
-`buildPlanAgentSkillsSection`
-`buildPlanAgentSystemPrepend`
-`PLAN_AGENT_NAMES`
-`isPlanAgent`
-`PLAN_FAMILY_NAMES`
-`isPlanFamily`
All 19 must be re-exported from the barrel.
### 4b. Verify `model-requirements.ts` re-exports match original exports
Original exports: `FallbackEntry`, `ModelRequirement`, `AGENT_MODEL_REQUIREMENTS`, `CATEGORY_MODEL_REQUIREMENTS`. All 4 must still be available.
## 5. LOC Compliance Check
Verify each new file is under 200 LOC (excluding prompt template text per modular-code-enforcement rule):
| File | Expected Total LOC | Non-prompt LOC | Compliant? |
- If no remote MCP exists for arXiv, this would need to be a stdio MCP or a custom HTTP wrapper. For this plan, we assume a remote MCP endpoint pattern consistent with existing built-ins.
### Step 2: Update `src/mcp/types.ts`
- Add `"arxiv"` to `McpNameSchema` enum: `z.enum(["websearch", "context7", "grep_app", "arxiv"])`
### Step 3: Update `src/mcp/index.ts`
- Import `arxiv` from `"./arxiv"`
- Add conditional block in `createBuiltinMcps()`:
```typescript
if (!disabledMcps.includes("arxiv")) {
mcps.arxiv = arxiv
}
```
### Step 4: Create `src/mcp/arxiv.test.ts`
- Test arXiv config shape (type, url, enabled, oauth)
- Follow pattern from existing tests (given/when/then)
> **Note:** The URL `https://mcp.arxiv.org` is a placeholder. The actual endpoint needs to be verified. If no hosted arXiv MCP exists, alternatives include community-hosted servers or a self-hosted wrapper around the arXiv REST API (`export.arxiv.org/api/query`). This would be the single blocker requiring resolution before merging.
Pattern followed: `grep-app.ts` (static export, no auth, no config factory needed since arXiv API is public).
- The arXiv API is public (`export.arxiv.org/api/query`) but has no native MCP endpoint
- Need to identify a hosted remote MCP server for arXiv (e.g., community-maintained or self-hosted)
- If no hosted endpoint exists, consider alternatives: (a) use a community-hosted one from the MCP registry, (b) flag this in the PR and propose a follow-up for hosting
- For this plan, assume a remote MCP endpoint at a URL like `https://mcp.arxiv.org` or a third-party equivalent
## Implementation Steps (4 files to modify, 2 files to create)
### Step 1: Create `src/mcp/arxiv.ts`
- Follow the `grep-app.ts` pattern (simplest: static export, no auth, no config)
- arXiv API is public, so no API key needed
- Export a `const arxiv` with `type: "remote"`, `url`, `enabled: true`, `oauth: false`
### Step 2: Update `src/mcp/types.ts`
- Add `"arxiv"` to the `McpNameSchema` z.enum array
- This makes it a recognized built-in MCP name
### Step 3: Update `src/mcp/index.ts`
- Import `arxiv` from `"./arxiv"`
- Add the `if (!disabledMcps.includes("arxiv"))` block inside `createBuiltinMcps()`
- Place it after `grep_app` block (alphabetical among new additions, or last)
### Step 4: Update `src/mcp/index.test.ts`
- Update test "should return all MCPs when disabled_mcps is empty" to expect 4 MCPs instead of 3
- Update test "should filter out all built-in MCPs when all disabled" to include "arxiv" in the disabled list and expect it not present
- Update test "should handle empty disabled_mcps by default" to expect 4 MCPs
- Update test "should only filter built-in MCPs, ignoring unknown names" to expect 4 MCPs
- Add new test: "should filter out arxiv when disabled"
### Step 5: Create `src/mcp/arxiv.test.ts` (optional, only if factory pattern used)
- If using static export (like grep-app), no separate test file needed
- If using factory with config, add tests following `websearch.test.ts` pattern
If the endpoint doesn't exist or returns non-2xx, the MCP will silently fail at runtime (MCP framework handles connection errors gracefully). This is acceptable for a built-in MCP but should be documented.
## 7. Regression Check
Verify no existing functionality is broken:
-`bun test` (full suite) passes
- Existing 3 MCPs (websearch, context7, grep_app) still work
-`disabled_mcps` config still works for all MCPs
-`mcp-config-handler.test.ts` passes (if it has count-based assertions, update them)
## Checklist
- [ ]`bun run typecheck` passes
- [ ]`bun test src/mcp/` passes (all tests green)
- [ ]`bun run build` succeeds
- [ ]`lsp_diagnostics` clean on all 4 changed files
- [ ] arXiv MCP endpoint URL verified reachable
- [ ] No hardcoded MCP count assertions broken elsewhere in codebase
"prompt":"The comment-checker hook is too aggressive - it's flagging legitimate comments that happen to contain 'Note:' as AI slop. Relax the regex pattern and add test cases for the false positives. Work on a separate branch and make a PR.",
"assertions":[
{
"id":"worktree-isolation",
"text":"Plan uses git worktree in a sibling directory",
"type":"manual"
},
{
"id":"real-comment-checker-files",
"text":"References actual comment-checker hook files in the codebase",
"type":"manual"
},
{
"id":"regression-tests",
"text":"Adds test cases specifically for 'Note:' false positive scenarios",
"type":"manual"
},
{
"id":"three-gates",
"text":"Verification loop includes all 3 gates",
"type":"manual"
},
{
"id":"minimal-change",
"text":"Only modifies regex and adds tests — no unrelated changes",
The comment-checker delegates to an external Go binary (`code-yeongyu/go-claude-code-comment-checker` v0.4.1). The binary contains the regex `(?i)^[\s#/*-]*note:\s*\w` which matches ANY comment starting with "Note:" followed by a word character. This flags legitimate technical notes like:
- `// Note: Thread-safe by design`
- `# Note: See RFC 7231 for details`
- `// Note: This edge case requires special handling`
Full list of 24 embedded regex patterns extracted from the binary:
- Add `exclude_patterns` config to comment-checker schema, allowing users to whitelist comment prefixes (e.g. `["^Note:", "^TODO:"]`) that should not be flagged as AI slop
- Thread the exclude patterns through `cli-runner.ts` and `cli.ts` to the Go binary via `--exclude-pattern` flags
- Add test cases covering false positive scenarios: legitimate technical notes, RFC references, and AI memo detection with/without exclusions
## Context
The comment-checker Go binary (`go-claude-code-comment-checker` v0.4.1) contains the regex `(?i)^[\s#/*-]*note:\s*\w` which matches ALL comments starting with "Note:" followed by a word character. This produces false positives for legitimate technical comments:
```typescript
// Note: Thread-safe by design <- flagged as AI slop
#Note: SeeRFC7231fordetails<-flaggedasAIslop
// Note: This edge case requires... <- flagged as AI slop
```
These are standard engineering comments, not AI agent memos.
## Changes
| File | Change |
|------|--------|
| `src/config/schema/comment-checker.ts` | Add `exclude_patterns: string[]` optional field |
| `src/hooks/comment-checker/cli.ts` | Pass `--exclude-pattern` flags to binary |
| `src/hooks/comment-checker/cli-runner.ts` | Thread `excludePatterns` through `processWithCli` and `processApplyPatchEditsWithCli` |
| `src/hooks/comment-checker/hook.ts` | Pass `config.exclude_patterns` to CLI runner calls |
| `src/hooks/comment-checker/cli.test.ts` | Add 6 new test cases for false positive scenarios |
| `src/hooks/comment-checker/hook.apply-patch.test.ts` | Add test verifying exclude_patterns config threading |
## Usage
```jsonc
// .opencode/oh-my-opencode.jsonc
{
"comment_checker":{
"exclude_patterns":["^Note:","^TODO:","^FIXME:"]
}
}
```
## Related
- Go binary repo: `code-yeongyu/go-claude-code-comment-checker` (needs corresponding `--exclude-pattern` flag support)
| Oracle (code quality) | Code quality check | Factory pattern consistency, no catch-all files, <200 LOC |
| Oracle (security) | Security review | Regex patterns are user-supplied - verify no ReDoS risk from config |
| Hephaestus (QA) | Hands-on execution | Run tests, verify mock binary tests actually exercise the exclude flow |
| Hephaestus (context) | Context mining | Check git history for related changes, verify no conflicting PRs |
### Potential review-work flags
1.**ReDoS concern**: User-supplied regex patterns in `exclude_patterns` could theoretically cause ReDoS in the Go binary. Mitigation: the patterns are passed as CLI args, Go's `regexp` package is RE2-based (linear time guarantee).
2.**Breaking change check**: Adding optional field to config schema is non-breaking (Zod `z.optional()` fills default).
3.**Go binary dependency**: The `--exclude-pattern` flag must exist in the Go binary for this to work. If the binary doesn't support it yet, the patterns are silently ignored (binary treats unknown flags differently).
### Failure handling
- If any Oracle flags issues: address feedback, push new commit, re-run review-work
- If Hephaestus QA finds test gaps: add missing tests, push, re-verify
## Gate C: Cubic (`cubic-dev-ai[bot]`)
### Expected review focus
- Schema change additive and backward-compatible
- Parameter threading is mechanical and low-risk
- Tests use mock binaries (shell scripts) - standard project pattern per `cli.test.ts`
{"text":"Plan uses git worktree in a sibling directory","passed":false,"evidence":"git checkout -b, no worktree"},
{"text":"References actual comment-checker hook files","passed":true,"evidence":"Deep analysis of Go binary, tree-sitter, formatter.go, agent_memo.go with line numbers"},
{"text":"Adds test cases for Note: false positive scenarios","passed":true,"evidence":"Detailed test cases distinguishing legit vs AI slop patterns"},
{"text":"Verification loop includes all 3 gates","passed":false,"evidence":"Only bun test and typecheck. No review-work or Cubic."},
{"text":"Only modifies regex and adds tests — no unrelated changes","passed":true,"evidence":"Adds allowed-prefix filter module — focused approach with config extension"}
The comment-checker hook delegates to an external Go binary (`code-yeongyu/go-claude-code-comment-checker`). The binary:
1. Detects ALL comments in written/edited code using tree-sitter
2. Filters out only BDD markers, linter directives, and shebangs
3. Flags every remaining comment as problematic (exit code 2)
4. In the output formatter (`formatter.go`), uses `AgentMemoFilter` to categorize comments for display
The `AgentMemoFilter` in `pkg/filters/agent_memo.go` contains the overly aggressive regex:
```go
regexp.MustCompile(`(?i)^[\s#/*-]*note:\s*\w`),
```
This matches ANY comment starting with `Note:` (case-insensitive) followed by a word character, causing legitimate comments like `// Note: Thread-safe implementation` or `// NOTE: See RFC 7231` to be classified as "AGENT MEMO" AI slop with an aggressive warning banner.
Additionally, the binary flags ALL non-filtered comments (not just agent memos), so even without the `Note:` regex, `// Note: ...` comments would still be flagged as generic "COMMENT DETECTED."
## Architecture Understanding
```
TypeScript (oh-my-openagent) Go Binary (go-claude-code-comment-checker)
Add `allowed_comment_prefixes` field with sensible defaults. This lets users configure which comment prefixes should be treated as legitimate (not AI slop).
### Step 3: Add a post-processing filter in cli-runner.ts
After the Go binary returns its result, parse the stderr message to identify and suppress comments that match allowed prefixes. The binary's output contains XML like:
The comment-checker hook's upstream Go binary (`go-claude-code-comment-checker`) flags ALL non-filtered comments as problematic. Its `AgentMemoFilter` regex `(?i)^[\s#/*-]*note:\s*\w` classifies any `Note:` comment as AI-generated "agent memo" slop, triggering an aggressive warning banner.
This causes false positives for legitimate, widely-used comment patterns:
```typescript
// Note: Thread-safe implementation required due to concurrent access
// NOTE: See RFC 7231 section 6.5.4 for 404 semantics
// Note: This timeout matches the upstream service SLA
```
These are standard engineering documentation patterns, not AI slop.
## Solution
Rather than waiting for an upstream binary fix, this PR adds a configurable **post-processing filter** on the TypeScript side:
2.**Filter**: After the Go binary returns flagged comments, `filterAllowedComments()` parses the XML output and suppresses comments matching allowed prefixes
3.**Behavior**: If ALL flagged comments are legitimate → suppress entire warning. If mixed → remove only the legitimate entries from the XML, keep the warning for actual slop.
description: "Full PR lifecycle: git worktree → implement → atomic commits → PR creation → verification loop (CI + review-work + Cubic approval) → merge. Keeps iterating until ALL gates pass and PR is merged. Worktree auto-cleanup after merge. Use whenever implementation work needs to land as a PR. Triggers: 'create a PR', 'implement and PR', 'work on this and make a PR', 'implement issue', 'land this as a PR', 'work-with-pr', 'PR workflow', 'implement end to end', even when user just says 'implement X' if the context implies PR delivery."
---
# Work With PR — Full PR Lifecycle
You are executing a complete PR lifecycle: from isolated worktree setup through implementation, PR creation, and an unbounded verification loop until the PR is merged. The loop has three gates — CI, review-work, and Cubic — and you keep fixing and pushing until all three pass simultaneously.
<architecture>
```
Phase 0: Setup → Branch + worktree in sibling directory
Phase 1: Implement → Do the work, atomic commits
Phase 2: PR Creation → Push, create PR targeting dev
Phase 3: Verify Loop → Unbounded iteration until ALL gates pass:
└─ Gate C: Cubic → cubic-dev-ai[bot] "No issues found"
Phase 4: Merge → Squash merge, worktree cleanup
```
</architecture>
---
## Phase 0: Setup
Create an isolated worktree so the user's main working directory stays clean. This matters because the user may have uncommitted work, and checking out a branch would destroy it.
If user provides a branch name, use it. Otherwise, derive from the task:
```bash
# Auto-generate: feature/short-description or fix/short-description
BRANCH_NAME="feature/$(echo"$TASK_SUMMARY"| tr '[:upper:] ''[:lower:]-'| head -c 50)"
git fetch origin "$BASE_BRANCH"
git branch "$BRANCH_NAME""origin/$BASE_BRANCH"
```
### 3. Create worktree
Place worktrees as siblings to the repo — not inside it. This avoids git nested repo issues and keeps the working tree clean.
```bash
WORKTREE_PATH="../${REPO_NAME}-wt/${BRANCH_NAME}"
mkdir -p "$(dirname "$WORKTREE_PATH")"
git worktree add "$WORKTREE_PATH""$BRANCH_NAME"
```
### 4. Set working context
All subsequent work happens inside the worktree. Install dependencies if needed:
```bash
cd"$WORKTREE_PATH"
# If bun project:
[ -f "bun.lock"]&& bun install
```
</setup>
---
## Phase 1: Implement
Do the actual implementation work inside the worktree. The agent using this skill does the work directly — no subagent delegation for the implementation itself.
**Scope discipline**: For bug fixes, stay minimal. Fix the bug, add a test for it, done. Do not refactor surrounding code, add config options, or "improve" things that aren't broken. The verification loop will catch regressions — trust the process.
<implementation>
### Commit strategy
Use the git-master skill's atomic commit principles. The reason for atomic commits: if CI fails on one change, you can isolate and fix it without unwinding everything.
```
3+ files changed → 2+ commits minimum
5+ files changed → 3+ commits minimum
10+ files changed → 5+ commits minimum
```
Each commit should pair implementation with its tests. Load `git-master` skill when committing:
```
task(category="quick", load_skills=["git-master"], prompt="Commit the changes atomically following git-master conventions. Repository is at {WORKTREE_PATH}.")
```
### Pre-push local validation
Before pushing, run the same checks CI will run. Catching failures locally saves a full CI round-trip (~3-5 min):
```bash
bun run typecheck
bun test
bun run build
```
Fix any failures before pushing. Each fix-commit cycle should be atomic.
</implementation>
---
## Phase 2: PR Creation
<pr_creation>
### Push and create PR
```bash
git push -u origin "$BRANCH_NAME"
```
Create the PR using the project's template structure:
```bash
gh pr create \
--base "$BASE_BRANCH"\
--head "$BRANCH_NAME"\
--title "$PR_TITLE"\
--body "$(cat <<'EOF'
## Summary
[1-3 sentences describing what this PR does and why]
## Changes
[Bullet list of key changes]
## Testing
- `bun run typecheck` ✅
- `bun test` ✅
- `bun run build` ✅
## Related Issues
[Link to issue if applicable]
EOF
)"
```
Capture the PR number:
```bash
PR_NUMBER=$(gh pr view --json number -q .number)
```
</pr_creation>
---
## Phase 3: Verification Loop
This is the core of the skill. Three gates must ALL pass for the PR to be ready. The loop has no iteration cap — keep going until done. Gate ordering is intentional: CI is cheapest/fastest, review-work is most thorough, Cubic is external and asynchronous.
<verify_loop>
```
while true:
1. Wait for CI → Gate A
2. If CI fails → read logs, fix, commit, push, continue
6. If Cubic has issues → fix issues, commit, push, continue
7. All three pass → break
```
### Gate A: CI Checks
CI is the fastest feedback loop. Wait for it to complete, then parse results.
```bash
# Wait for checks to start (GitHub needs a moment after push)
# Then watch for completion
gh pr checks "$PR_NUMBER" --watch --fail-fast
```
**On failure**: Get the failed run logs to understand what broke:
```bash
# Find the failed run
RUN_ID=$(gh run list --branch "$BRANCH_NAME" --status failure --json databaseId --jq '.[0].databaseId')
# Get failed job logs
gh run view "$RUN_ID" --log-failed
```
Read the logs, fix the issue, commit atomically, push, and re-enter the loop.
### Gate B: review-work
The review-work skill launches 5 parallel sub-agents (goal verification, QA, code quality, security, context mining). All 5 must pass.
Invoke review-work after CI passes — there's no point reviewing code that doesn't build:
```
task(
category="unspecified-high",
load_skills=["review-work"],
run_in_background=false,
description="Post-implementation review of PR changes",
prompt="Review the implementation work on branch {BRANCH_NAME}. The worktree is at {WORKTREE_PATH}. Goal: {ORIGINAL_GOAL}. Constraints: {CONSTRAINTS}. Run command: bun run dev (or as appropriate)."
)
```
**On failure**: review-work reports blocking issues with specific files and line numbers. Fix each blocking issue, commit, push, and re-enter the loop from Gate A (since code changed, CI must re-run).
### Gate C: Cubic Approval
Cubic (`cubic-dev-ai[bot]`) is an automated review bot that comments on PRs. It does NOT use GitHub's APPROVED review state — instead it posts comments with issue counts and confidence scores.
**Approval signal**: The latest Cubic comment contains `**No issues found**` and confidence `**5/5**`.
**Issue signal**: The comment lists issues with file-level detail.
```bash
# Get the latest Cubic review
CUBIC_REVIEW=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews"\
--jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .body')
# Check if approved
ifecho"$CUBIC_REVIEW"| grep -q "No issues found";then
echo"Cubic: APPROVED"
else
echo"Cubic: ISSUES FOUND"
echo"$CUBIC_REVIEW"
fi
```
**On issues**: Cubic's review body contains structured issue descriptions. Parse them, determine which are valid (some may be false positives), fix the valid ones, commit, push, re-enter from Gate A.
Cubic reviews are triggered automatically on PR updates. After pushing a fix, wait for the new review to appear before checking again. Use `gh api` polling with a conditional loop:
```bash
# Wait for new Cubic review after push
PUSH_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
while true;do
LATEST_REVIEW_TIME=$(gh api "repos/${REPO}/pulls/${PR_NUMBER}/reviews"\
--jq '[.[] | select(.user.login == "cubic-dev-ai[bot]")] | last | .submitted_at')
if[["$LATEST_REVIEW_TIME" > "$PUSH_TIME"]];then
break
fi
# Use gh api call itself as the delay mechanism — each call takes ~1-2s
1. Fix ONLY the issues identified by the failing gate
2. Commit atomically (one logical fix per commit)
3. Push
4. Re-enter from Gate A (code changed → full re-verification)
Avoid the temptation to "improve" unrelated code during fix iterations. Scope creep in the fix loop makes debugging harder and can introduce new failures.
</verify_loop>
---
## Phase 4: Merge & Cleanup
Once all three gates pass:
<merge_cleanup>
### Merge the PR
```bash
# Squash merge to keep history clean
gh pr merge "$PR_NUMBER" --squash --delete-branch
```
### Clean up the worktree
The worktree served its purpose — remove it to avoid disk bloat:
```bash
cd"$ORIGINAL_DIR"# Return to original working directory
- Multiple unrelated responsibilities mixed together
**If you find mixed logic in index.ts**: Extract each responsibility into its own dedicated file BEFORE making any other changes. This is not optional.
## Rule 2: No Catch-All Files — utils.ts / service.ts are CODE SMELLS
A single `utils.ts`, `helpers.ts`, `service.ts`, or `common.ts` is a **gravity well** — every unrelated function gets tossed in, and it grows into an untestable, unreviewable blob.
**These file names are BANNED as top-level catch-alls.** Instead:
| `helpers.ts` with 15 unrelated exports | One file per logical domain |
**Design for reusability from the start.** Each module should be:
- **Independently importable** — no consumer should need to pull in unrelated code
- **Self-contained** — its dependencies are explicit, not buried in a shared grab-bag
- **Nameable by purpose** — the filename alone tells you what it does
If you catch yourself typing `utils.ts` or `service.ts`, STOP and name the file after what it actually does.
## Rule 3: Single Responsibility Principle — ABSOLUTE
Every `.ts` file MUST have exactly ONE clear, nameable responsibility.
**Self-test**: If you cannot describe the file's purpose in ONE short phrase (e.g., "parses YAML frontmatter", "matches rules against file paths"), the file does too much. Split it.
| Signal | Action |
|--------|--------|
| File has 2+ unrelated exported functions | **SPLIT NOW** — each into its own module |
| File mixes I/O with pure logic | **SPLIT NOW** — separate side effects from computation |
| File has both types and implementation | **SPLIT NOW** — types.ts + implementation.ts |
| You need to scroll to understand the file | **SPLIT NOW** — it's too large |
## Rule 4: 200 LOC Hard Limit — CODE SMELL DETECTOR
Any `.ts`/`.tsx` file exceeding **200 lines of code** (excluding prompt strings, template literals containing prompts, and `.md` content) is an **immediate code smell**.
**When you detect a file > 200 LOC**:
1.**STOP** current work
2.**Identify** the multiple responsibilities hiding in the file
3.**Extract** each responsibility into a focused module
4.**Verify** each resulting file is < 200 LOC and has a single purpose
5.**Resume** original work
Prompt-heavy files (agent definitions, skill definitions) where the bulk of content is template literal prompt text are EXEMPT from the LOC count — but their non-prompt logic must still be < 200 LOC.
### How to Count LOC
**Count these** (= actual logic):
- Import statements
- Variable/constant declarations
- Function/class/interface/type definitions
- Control flow (`if`, `for`, `while`, `switch`, `try/catch`)
- Expressions, assignments, return statements
- Closing braces `}` that belong to logic blocks
**Exclude these** (= not logic):
- Blank lines
- Comment-only lines (`//`, `/* */`, `/** */`)
- Lines inside template literals that are prompt/instruction text (e.g., the string body of `` const prompt = `...` ``)
- Lines inside multi-line strings used as documentation/prompt content
**Quick method**: Read the file → subtract blank lines, comment-only lines, and prompt string content → remaining count = LOC.
**Example**:
```typescript
// 1 import { foo } from "./foo"; ← COUNT
// 2 ← SKIP (blank)
// 3 // Helper for bar ← SKIP (comment)
// 4 export function bar(x: number) { ← COUNT
// 5 const prompt = ` ← COUNT (declaration)
// 6 You are an assistant. ← SKIP (prompt text)
// 7 Follow these rules: ← SKIP (prompt text)
// 8 `; ← COUNT (closing)
// 9 return process(prompt, x); ← COUNT
// 10 } ← COUNT
```
→ LOC = **5** (lines 1, 4, 5, 9, 10). Not 10.
When in doubt, **round up** — err on the side of splitting.
## How to Apply
When reading, writing, or editing ANY `.ts`/`.tsx` file:
1.**Check the file you're touching** — does it violate any rule above?
2.**If YES** — refactor FIRST, then proceed with your task
3.**If creating a new file** — ensure it has exactly one responsibility and stays under 200 LOC
4.**If adding code to an existing file** — verify the addition doesn't push the file past 200 LOC or add a second responsibility. If it does, extract into a new module.
# Pre-Publish BLOCK Issues: Fix ALL Before Release
Two independent pre-publish reviews (Opus 4.6 + GPT-5.4) both concluded **BLOCK -- do not publish**. You must fix ALL blocking issues below using UltraBrain parallel agents. Work TDD-style: write/update tests first, then fix, verify tests pass.
## Strategy
Use ultrawork (ulw) to spawn UltraBrain agents in parallel. Each UB agent gets a non-overlapping scope. After all agents complete, run bun test to verify everything passes. Commit atomically per fix group.
---
## CRITICAL BLOCKERS (must fix -- 6 items)
### C1: Hashline Backward Compatibility
**Problem:** Strict whitespace hashing in hashline changes LINE#ID values for indented lines. Breaks existing anchors in cached/persisted edit operations.
**Fix:** Add a compatibility shim -- when lookup by new hash fails, fall back to legacy hash (without strict whitespace). Or version the hash format.
**Files:** Look for hashline-related files in src/tools/ or src/shared/
### C2: OpenAI-Only Model Catalog Broken with OpenCode-Go
**Problem:** isOpenAiOnlyAvailability() does not exclude availability.opencodeGo. When OpenCode-Go is present, OpenAI-only detection is wrong -- models get misrouted.
**Fix:** Add !availability.opencodeGo check to isOpenAiOnlyAvailability().
**Files:** Model/provider system files -- search for isOpenAiOnlyAvailability
### C3: CLI/Runtime Model Table Divergence
**Problem:** Model tables disagree between CLI install-time and runtime:
- ultrabrain: gpt-5.3-codex in CLI vs gpt-5.4 in runtime
- atlas: claude-sonnet-4-5 in CLI vs claude-sonnet-4-6 in runtime
- unspecified-high also diverges
**Fix:** Reconcile all model tables. Pick the correct model for each and make CLI + runtime match.
**Files:** Search for model table definitions, agent configs, CLI model references
> [](https://github.com/code-yeongyu/oh-my-opencode/releases/tag/v3.0.0)
> > **Oh My OpenCode 3.0が正式リリースされました!`oh-my-opencode@latest`を使用してインストールしてください。**
> **ohmyopencode.com은 이 프로젝트와 제휴 관계가 아닙니다.** 우리는 해당 사이트를 운영하거나 지지하지 않습니다.
>
> OhMyOpenCode는 **무료 오픈 소스**입니다. "공식"을 표방하는 제3자 사이트에서 설치 프로그램을 다운로드하거나 결제 정보를 입력하지 마십시오.
>
> 사칭 사이트는 유료 벽 뒤에 있어 **배포하는 내용을 확인할 수 없습니다.** 해당 사이트의 다운로드는 **잠재적으로 위험한 것으로 간주**하세요.
>
> ✅ 공식 다운로드: https://github.com/code-yeongyu/oh-my-opencode/releases
> 핵심 메인테이너 Q가 부상을 입어, 이번 주에는 이슈/PR 응답 및 릴리스가 지연될 수 있습니다.
>양해와 응원에 감사드립니다.
> [!NOTE]
>
> [](https://sisyphuslabs.ai)
> > **Sisyphus의 완전한 제품화 버전을 구축하여 프론티어 에이전트의 미래를 정의하고 있습니다. <br />[여기서](https://sisyphuslabs.ai) 대기 명단에 등록하세요.**
>
> [!TIP]
> 저희와 함께 하세요!
>
> [](https://github.com/code-yeongyu/oh-my-opencode/releases/tag/v3.0.0)
> > **Oh My OpenCode 3.0이 정식 출시되었습니다! `oh-my-opencode@latest`를 사용하여 설치하세요.**
> | [<img alt="Discord link" src="https://img.shields.io/discord/1452487457085063218?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=flat-square" width="156px" />](https://discord.gg/PUwSMR9XNk) | [Discord 커뮤니티](https://discord.gg/PUwSMR9XNk)에 가입하여 기여자 및 다른 `oh-my-openagent` 사용자들과 소통하세요. |
> | :-----| :----- |
> | [<img alt="X link" src="https://img.shields.io/badge/Follow-%40justsisyphus-00CED1?style=flat-square&logo=x&labelColor=black" width="156px" />](https://x.com/justsisyphus) | `oh-my-opencode`에 대한 뉴스와 업데이트가 제 X 계정에 게시되었습니다. <br /> 실수로 정지된 이후, [@justsisyphus](https://x.com/justsisyphus)가 제 대신 업데이트를 게시합니다. |
> | [<img alt="GitHub Follow" src="https://img.shields.io/github/followers/code-yeongyu?style=flat-square&logo=github&labelColor=black&color=24292f" width="156px" />](https://github.com/code-yeongyu) | 더 많은 프로젝트를 위해 GitHub에서 [@code-yeongyu](https://github.com/code-yeongyu)를 팔로우하세요. |
> | [<img alt="X link" src="https://img.shields.io/badge/Follow-%40justsisyphus-00CED1?style=flat-square&logo=x&labelColor=black" width="156px" />](https://x.com/justsisyphus) | `oh-my-openagent`에 대한 소식과 업데이트는 제 X 계정에 올라왔었지만, <br /> 실수로 정지된 이후에는 [@justsisyphus](https://x.com/justsisyphus)가 대신 업데이트를 게시하고 있습니다. |
> | [<img alt="GitHub Follow" src="https://img.shields.io/github/followers/code-yeongyu?style=flat-square&logo=github&labelColor=black&color=24292f" width="156px" />](https://github.com/code-yeongyu) | 더 많은 프로젝트를 보려면 GitHub에서 [@code-yeongyu](https://github.com/code-yeongyu)를 팔로우하세요. |
<!-- <CENTERED SECTION FOR GITHUB DISPLAY> -->
<div align="center">
[](https://github.com/code-yeongyu/oh-my-opencode#oh-my-opencode)
> 이것은 코딩을 스테로이드로 만드는 것 — 실제로 작동하는 `oh-my-opencode`입니다. 백그라운드 에이전트 실행, 오라클, 라이브러리언, 프론트엔드 엔지니어와 같은 전문 에이전트 호출. 정교하게 제작된 LSP/AST 도구, 큐레이팅된 MCP, 완전한 Claude Code 호환 계층 사용.
# Claude OAuth 액세스 공지
## TL;DR
> Q. oh-my-opencode를 사용할 수 있나요?
네.
> Q. Claude Code 구독과 함께 사용할 수 있나요?
기술적으로는 가능합니다. 하지만 사용을 추천할 수는 없습니다.
## FULL
> 2026년 1월 현재, Anthropic은 ToS 위반을 이유로 제3자 OAuth 액세스를 제한했습니다.
> Anthropic은 당신을 가두고 싶어 합니다. Claude Code는 멋진 감옥이지만, 여전히 감옥일 뿐이죠.
>
> [**Anthropic은 이 프로젝트 oh-my-opencode를 opencode 차단의 정당화로 인용했습니다.**](https://x.com/thdxr/status/2010149530486911014)
>
> 실제로 커뮤니티에는 Claude Code의 oauth 요청 서명을 위조하는 일부 플러그인이 존재합니다.
>
> 기술적 감지 여부와 관계없이 이러한 도구는 작동할 수 있지만, 사용자는 ToS 영향을 인식해야 하며 개인적으로는 사용을 추천하지 않습니다.
>
> 이 프로젝트는 공식이 아닌 도구 사용으로 발생하는 모든 문제에 대해 책임지지 않으며, **우리는 해당 oauth 시스템에 대한 사용자 정의 구현이 없습니다.**
> 우리는 여기서 그런 가두리를 하지 않습니다. Claude로 오케스트레이션하고, GPT로 추론하고, Kimi로 속도 내고, Gemini로 비전 처리한다. 미래는 하나의 승자를 고르는 게 아니라 전부를 오케스트레이션하는 거다. 모델은 매달 싸지고, 매달 똑똑해진다. 어떤 단일 프로바이더도 독재하지 못할 것이다. 우리는 그 열린 시장을 위해 만들고 있다.
> "이것 덕분에 Cursor 구독을 취소했습니다. 오픈소스 커뮤니티에서 믿을 수 없는 일들이 일어나고 있습니다." - [Arthur Guiot](https://x.com/arthur_guiot/status/2008736347092382053?s=20)
> "이것 덕분에 Cursor 구독을 취소했습니다. 오픈소스 커뮤니티에서 믿을 수 없는 일들이 일어나고 있네요." - [Arthur Guiot](https://x.com/arthur_guiot/status/2008736347092382053?s=20)
> "Claude Code가 7일 동안 하는 일을 인간은 3개월 동안 한다면, Sisyphus는 1시간 만에 합니다. 작업이 완료될 때까지 작동합니다. 규율 있는 에이전트입니다." — B, 양적 연구원
> "Claude Code가 인간이 3개월 걸릴 일을 7일 만에 한다면, Sisyphus는 1시간 만에 해냅니다. 작업이 끝날 때까지 그냥 계속 알아서 작동합니다. 이건 정말 규율이 잡힌 에이전트예요." <br/>- B, Quant Researcher
> "Oh My Opencode로 하루 만에 8000개의 eslint 경고를 해결했습니다" — [Jacob Ferrari](https://x.com/jacobferrari_/status/2003258761952289061)
> "Oh My OpenAgent로 하루 만에 eslint 경고 8000개를 해결했습니다." <br/>- [Jacob Ferrari](https://x.com/jacobferrari_/status/2003258761952289061)
> "Ohmyopencode와 ralph 루프를 사용하여 하룻밤 사이에 45,000줄의 tauri 앱을 SaaS 웹앱으로 변환했습니다. 인터뷰 프롬프트로 시작하여 질문에 대한 등급과 추천을 물어봤습니다. 그것이 작동하는 모습을 보는 것은 놀라웠고, 이 아침에 기본적으로 작동하는 웹사이트로 깨어나는 것이었습니다!" - [James Hargis](https://x.com/hargabyte/status/2007299688261882202)
> "Ohmyopencode와 ralph loop를 써서 45k 라인짜리 tauri 앱을 하룻밤 만에 SaaS 웹앱으로 변환했어요. 인터뷰 모드로 시작해서, 제가 쓴 프롬프트에 대해 질문하고 추천을 부탁했죠. 일하는 걸 지켜보는 것도 재밌었고, 아침에 일어났더니 웹사이트가 대부분 돌아가고 있는 걸 보고 경악했습니다!" - [James Hargis](https://x.com/hargabyte/status/2007299688261882202)
> "oh-my-opencode를 사용하세요, 다시는 돌아갈 수 없을 것입니다" — [d0t3ch](https://x.com/d0t3ch/status/2001685618200580503)
> "oh-my-openagent 쓰세요, 다시는 예전으로 못 돌아갑니다." <br/>- [d0t3ch](https://x.com/d0t3ch/status/2001685618200580503)
> "아직 왜 그렇게 훌륭한지 정확히 설명할 수 없지만, 개발 경험이 완전히 다른 차원에 도달했습니다." - [
> "뭐가 이렇게 대단한 건지 아직 정확하게 말로 표현하긴 어려운데, 개발 경험 자체가 완전히 다른 차원에 도달해버렸어요." - [苔硯:こけすずり](https://x.com/kokesuzuri/status/2008532913961529372?s=20)
> "이번 주말에 open code, oh my opencode, supermemory으로 마인크래프트/소울스 같은 기괴한 것을 만들고 있습니다."
> "점심 후 산책을 가는 동안 웅크림 애니메이션을 추가하도록 요청 중입니다. [동영상]" - [MagiMetal](https://x.com/MagiMetal/status/2005374704178373023)
> "주말에 마인크래프트/소울라이크 같은 괴물 같은 걸 만들어보려고 open code, oh my openagent, supermemory로 실험 중입니다. 점심 먹고 산책 다녀오는 동안 앉기 애니메이션을 추가하라고 시켜뒀어요. [영상]" - [MagiMetal](https://x.com/MagiMetal/status/2005374704178373023)
> "여러분이 이것을 핵심에 통합하고 그를 채용해야 합니다. 진지합니다. 정말, 정말, 정말 훌륭합니다." — Henning Kilset
> "이걸 코어에 당겨오고 저 사람 스카우트해야 돼요. 진심으로. 이거 진짜, 진짜, 진짜 좋습니다." <br/>- Henning Kilset
> "그를 설득할 수 있다면 @yeon_gyu_kim을 고용하세요, 이 사람은 opencode를 혁신했습니다." — [mysticaltech](https://x.com/mysticaltech/status/2001858758608376079)
> "설득할 수만 있다면 @yeon_gyu_kim 채용하세요, 이 사람이 opencode를 혁명적으로 바꿨습니다." <br/>- [mysticaltech](https://x.com/mysticaltech/status/2001858758608376079)
> "Oh My OpenCode는 실제로 미칩니다" - [YouTube - Darren Builds AI](https://www.youtube.com/watch?v=G_Snfh2M41M)
> "Oh My OpenAgent는 진짜 미쳤다" - [YouTube - Darren Builds AI](https://www.youtube.com/watch?v=G_Snfh2M41M)
---
## 목차
# Oh My OpenAgent
- [Oh My OpenCode](#oh-my-opencode)
- [이 README를 읽지 않고 건너뛰세요](#이-readme를-읽지-않고-건너뛰세요)
- [에이전트의 시대입니다](#에이전트의-시대입니다)
- [🪄 마법의 단어: `ultrawork`](#-마법의-단어-ultrawork)
- [읽고 싶은 분들을 위해: Sisyphus를 소개합니다](#읽고-싶은-분들을-위해-sisyphus를-소개합니다)
- [그냥 설치하세요](#그냥-설치하세요)
- [설치](#설치)
- [인간을 위한](#인간을-위한)
- [LLM 에이전트를 위한](#llm-에이전트를-위한)
- [제거](#제거)
- [기능](#기능)
- [구성](#구성)
- [JSONC 지원](#jsonc-지원)
- [Google 인증](#google-인증)
- [에이전트](#에이전트)
- [권한 옵션](#권한-옵션)
- [내장 스킬](#내장-스킬)
- [Git Master](#git-master)
- [Sisyphus 에이전트](#sisyphus-에이전트)
- [백그라운드 작업](#백그라운드-작업)
- [카테고리](#카테고리)
- [훅](#훅)
- [MCP](#mcp)
- [LSP](#lsp)
- [실험적 기능](#실험적-기능)
- [환경 변수](#환경-변수)
- [작성자의 메모](#작성자의-메모)
- [경고](#경고)
- [다음 기업 전문가들이 사랑합니다](#다음-기업-전문가들이-사랑합니다)
Claude Code, Codex, 온갖 OSS 모델들 사이에서 헤매고 있나요. 워크플로우 설정하랴, 에이전트 디버깅하랴 피곤할 겁니다.
- 왜 좋은지, 왜 그저 또 다른 상용구 설정이 아닌지, 실제로 무엇이 좋아지는지 물어보세요.
- LLM 에이전트라면 이 프로젝트로 지금 가능한 것을 사용자에게 설명하세요.
다음과 같이:
```
Hey please read this readme and tell me why it is different from other agent harness? why is this actually good? https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/refs/heads/dev/README.md
```
### 🪄 마법의 단어: `ultrawork`
**이 모든 것을 읽고 싶지 않나요? 프롬프트에 `ultrawork`(또는 `ulw`)를 포함하세요.**
그게 전부입니다. 우리가 제공하는 모든 기능이 마법처럼 작동합니다 — 병렬 에이전트, 백그라운드 작업, 깊은 탐색, 완료될 때까지 끈질긴 실행. 에이전트가 나머지를 자동으로 파악합니다.
### 읽고 싶은 분들을 위해: Sisyphus를 소개합니다

그리스 신화에서 시시포스는 신들을 속인 형벌로 영원히 바위를 언덕 위로 굴려야 했습니다. LLM 에이전트는 정말 잘못한 것이 없지만, 그들도 매일 자신의 "돌" — 생각을 굴립니다.
내 삶도 다르지 않습니다. 돌이켜보면 우리는 이 에이전트들과 그리 다르지 않습니다.
**맞습니다! LLM 에이전트는 우리와 다르지 않습니다. 훌륭한 도구와 확고한 팀원을 제공하면 우리만큼 훌륭한 코드를 작성하고 똑같이 훌륭하게 작업할 수 있습니다.**
우리의 주요 에이전트를 만나보세요: Sisyphus (Opus 4.5 High). 아래는 Sisyphus가 그 바위를 굴리는 데 사용하는 도구입니다.
*아래의 모든 것은 사용자 정의 가능합니다. 원하는 것을 가져가세요. 모든 기능은 기본적으로 활성화됩니다. 아무것도 할 필요가 없습니다. 포함되어 있으며, 즉시 작동합니다.*
- Sisyphus의 팀원 (큐레이팅된 에이전트)
- Oracle: 디자인, 디버깅 (GPT 5.2 Medium)
- Frontend UI/UX Engineer: 프론트엔드 개발 (Gemini 3 Pro)
- Librarian: 공식 문서, 오픈 소스 구현, 코드베이스 탐색 (Claude Sonnet 4.5)
- Explore: 엄청나게 빠른 코드베이스 탐색 (Contextual Grep) (Grok Code)
- 완전한 LSP / AstGrep 지원: 결정적으로 리팩토링합니다.
- TODO 연속 강제: 에이전트가 중간에 멈추면 계속하도록 강제합니다. **이것이 Sisyphus가 그 바위를 굴리게 하는 것입니다.**
- 주석 검사기: AI가 과도한 주석을 추가하는 것을 방지합니다. Sisyphus가 생성한 코드는 인간이 작성한 것과 구별할 수 없어야 합니다.
| 🤖 | **기강 잡힌 에이전트 (Discipline Agents)** | Sisyphus가 Hephaestus, Oracle, Librarian, Explore를 오케스트레이션합니다. 완전한 AI 개발팀이 병렬로 돌아갑니다. |
| ⚡ | **`ultrawork` / `ulw`** | 단어 하나면 됩니다. 모든 에이전트가 활성화되고 다 끝날 때까지 멈추지 않습니다. |
| 🚪 | **[IntentGate](https://factory.ai/news/terminal-bench)** | 사용자의 진짜 의도를 분석한 뒤 분류하거나 행동합니다. 더 이상 문자 그대로 오해해서 헛짓거리하는 일이 없습니다. |
| 🔗 | **해시 기반 편집 툴** | `LINE#ID` 콘텐츠 해시로 모든 변경 사항을 검증합니다. stale-line 에러 0%. [oh-my-pi](https://github.com/can1357/oh-my-pi)에서 영감을 받았습니다. [하니스 프로블러 →](https://blog.can.ac/2026/02/12/the-harness-problem/) |
| 🛠️ | **LSP + AST-Grep** | 워크스페이스 단위 이름 변경, 빌드 전 진단, AST 기반 재작성. 에이전트에게 IDE급 정밀도를 제공합니다. |
| 🧠 | **백그라운드 에이전트** | 5명 이상의 전문가를 병렬로 투입합니다. 컨텍스트는 가볍게 유지하고 결과는 준비될 때 받습니다. |
| 📚 | **기본 내장 MCP** | Exa(웹 검색), Context7(공식 문서), Grep.app(GitHub 검색). 항상 켜져 있습니다. |
| 🔁 | **Ralph Loop / `/ulw-loop`** | 자기 참조 루프. 100% 완료될 때까지 절대 멈추지 않습니다. |
| ✅ | **Todo 강제 집행** | 에이전트가 딴짓한다고요? 시스템이 멱살 잡고 끌고 옵니다. 당신의 작업은 무조건 끝납니다. |
| 💬 | **주석 검사기** | 주석에 AI 냄새나는 헛소리를 빼버립니다. 시니어 개발자가 짠 것 같은 코드가 됩니다. |
| 🖥️ | **Tmux 연동** | 완전한 인터랙티브 터미널. REPL, 디버거, TUI 앱들 모두 실시간으로 돌아갑니다. |
| 🔌 | **Claude Code 호환성** | 기존 훅, 명령어, 스킬, MCP, 플러그인? 전부 여기서 그대로 돌아갑니다. |
| 🎯 | **스킬 내장 MCP** | 스킬이 자기만의 MCP 서버를 들고 다닙니다. 컨텍스트가 부풀어 오르지 않습니다. |
| 📋 | **Prometheus 플래너** | 인터뷰 모드로 코드 한 줄 만지기 전에 전략적인 계획부터 세웁니다. |
| 🔍 | **`/init-deep`** | 프로젝트 전체에 걸쳐 계층적인 `AGENTS.md` 파일을 자동 생성합니다. 토큰 효율과 에이전트 성능 둘 다 잡습니다. |
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**)는 당신의 메인 오케스트레이터입니다. 공격적인 병렬 실행으로 계획을 세우고, 전문가들에게 위임하며, 완료될 때까지 밀어붙입니다. 중간에 포기하는 법이 없습니다.
**Hephaestus** (`gpt-5.3-codex`)는 당신의 자율 딥 워커입니다. 레시피가 아니라 목표를 주세요. 베이비시터 없이 알아서 코드베이스를 탐색하고, 패턴을 연구하며, 끝에서 끝까지 전부 해냅니다. *진정한 장인(The Legitimate Craftsman).*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**)는 당신의 전략 플래너입니다. 인터뷰 모드로 작동합니다. 코드 한 줄 만지기 전에 질문을 던져 스코프를 파악하고 상세한 계획부터 세웁니다.
모든 에이전트는 해당 모델의 특장점에 맞춰 튜닝되어 있습니다. 수동으로 모델 바꿔가며 뻘짓하지 마세요. [더 알아보기 →](docs/guide/overview.md)
> Anthropic이 [우리 때문에 OpenCode를 막아버렸습니다.](https://x.com/thdxr/status/2010149530486911014) 그래서 Hephaestus의 별명이 "진정한 장인(The Legitimate Craftsman)"인 겁니다. (어디서 많이 들어본 이름이죠?) 아이러니를 노렸습니다.
>
> Opus에서 제일 잘 돌아가긴 하지만, Kimi K2.5 + GPT-5.3 Codex 조합만으로도 바닐라 Claude Code는 가볍게 바릅니다. 설정도 필요 없습니다.
### 에이전트 오케스트레이션
Sisyphus가 하위 에이전트에게 일을 맡길 때, 모델을 직접 고르지 않습니다. **카테고리**를 고릅니다. 카테고리는 자동으로 올바른 모델에 매핑됩니다:
[oh-my-pi](https://github.com/can1357/oh-my-pi)에서 영감을 받아, **Hashline**을 구현했습니다. 에이전트가 읽는 모든 줄에는 콘텐츠 해시 태그가 붙어 나옵니다:
```
11#VK| function hello() {
22#XJ| return "world";
33#MB| }
```
에이전트는 이 태그를 참조해서 편집합니다. 마지막으로 읽은 후 파일이 변경되었다면 해시가 일치하지 않아 코드가 망가지기 전에 편집이 거부됩니다. 공백을 똑같이 재현할 필요도 없고, 엉뚱한 줄을 수정하는 에러(stale-line)도 없습니다.
Grok Code Fast 1 기준으로 성공률이 **6.7% → 68.3%** 로 올랐습니다. 오직 편집 툴 하나 바꿨을 뿐인데 말이죠.
### 깊은 초기화. `/init-deep`
`/init-deep`을 실행하세요. 계층적인 `AGENTS.md` 파일을 알아서 만들어줍니다:
```
project/
├── AGENTS.md ← 프로젝트 전체 컨텍스트
├── src/
│ ├── AGENTS.md ← src 전용 컨텍스트
│ └── components/
│ └── AGENTS.md ← 컴포넌트 전용 컨텍스트
```
에이전트가 알아서 관련된 컨텍스트만 쏙쏙 읽어갑니다. 수동으로 관리할 필요가 없습니다.
### 플래닝. Prometheus
복잡한 작업인가요? 대충 프롬프트 던지고 기도하지 마세요.
`/start-work`를 치면 Prometheus가 호출됩니다. **진짜 엔지니어처럼 당신을 인터뷰하고**, 스코프와 모호한 점을 식별한 뒤, 코드 한 줄 만지기 전에 검증된 계획부터 세웁니다. 에이전트는 시작하기도 전에 자기가 뭘 만들어야 하는지 정확히 알게 됩니다.
### 스킬 (Skills)
스킬은 단순한 프롬프트 쪼가리가 아닙니다. 각각 다음을 포함합니다:
- 도메인에 특화된 시스템 인스트럭션
- 필요할 때만 켜지는 내장 MCP 서버
- 스코프가 제한된 권한 (에이전트가 선을 넘지 않도록)
기본 내장 스킬: `playwright` (브라우저 자동화), `git-master` (원자적 커밋, 리베이스 수술), `frontend-ui-ux` (디자인 중심 UI).
직접 추가하려면: `.opencode/skills/*/SKILL.md` 또는 `~/.config/opencode/skills/*/SKILL.md`.
**전체 기능이 궁금하신가요?** 에이전트, 훅, 툴, MCP 등 모든 디테일은 **[기능 문서 (Features)](docs/reference/features.md)** 를 확인하세요.
---
> **비하인드 스토리가 궁금하신가요?** 왜 Sisyphus가 돌을 굴리는지, 왜 Hephaestus가 "진정한 장인"인지, 그리고 [오케스트레이션 가이드](docs/guide/orchestration.md)를 읽어보세요.
>
> oh-my-openagent가 처음이신가요? 어떤 모델을 써야 할지 **[설치 가이드](docs/guide/installation.md#step-5-understand-your-model-setup)** 에서 추천 조합을 확인하세요.
## 제거 (Uninstallation)
oh-my-openagent를 지우려면:
1.**OpenCode 설정에서 플러그인 제거**
`~/.config/opencode/opencode.json` (또는 `opencode.jsonc`)를 열고 `plugin` 배열에서 `"oh-my-openagent"`를 지우세요.
[AmpCode](https://ampcode.com)와 [Claude Code](https://code.claude.com/docs/overview)의 영향을 아주 짙게 받았습니다. 기능들을 포팅했고, 대다수는 개선했습니다. 아직도 짓고 있는 중입니다. 이건 **Open**Code니까요.
## 작성자의 메모
다른 하네스들도 멀티 모델 오케스트레이션을 약속합니다. 하지만 우리는 그걸 "진짜로" 내놨습니다. 안정성도 챙겼고요. 말로만이 아니라 실제로 돌아가는 기능들입니다.
**이 프로젝트의 철학에 궁금한가요?** [Ultrawork 선언문](docs/ultrawork-manifesto.md)을 읽어보세요.
제가 이 프로젝트의 가장 병적인 헤비 유저입니다:
- 어떤 모델의 로직이 가장 날카로운가?
- 디버깅의 신은 누구인가?
- 글은 누가 제일 잘 쓰는가?
- 프론트엔드 생태계는 누가 지배하고 있는가?
- 백엔드 끝판왕은 누구인가?
- 데일리 드라이빙용으로 제일 빠른 건 뭔가?
- 경쟁사들은 지금 뭘 출시하고 있는가?
Oh My OpenCode를 설치하세요.
이 플러그인은 그 모든 질문의 정수(Distillation)입니다. 가장 좋은 것만 가져다 쓰세요. 개선할 점이 보인다고요? PR은 언제나 환영입니다.
순수하게 개인용으로 $24,000 토큰 가치의 LLM을 사용했습니다.
모든 도구를 시도하고 구성했습니다. OpenCode가 승리했습니다.
**어떤 하네스를 쓸지 고뇌하는 건 이제 그만두세요.**
**제가 직접 리서치하고, 제일 좋은 것만 훔쳐 와서, 여기에 욱여넣겠습니다.**
내가 겪은 모든 문제에 대한 답변이 이 플러그인에 구워져 있습니다. 설치하고 바로 가세요.
OpenCode가 Debian/Arch라면 Oh My OpenCode는 Ubuntu/[Omarchy](https://omarchy.org/)입니다.
거만해 보이나요? 더 나은 방법이 있다면 기여하세요. 대환영입니다.
언급된 어떤 프로젝트/모델과도 아무런 이해관계가 없습니다. 그냥 순수하게 개인적인 실험의 결과물입니다.
[AmpCode](https://ampcode.com)와 [Claude Code](https://code.claude.com/docs/overview)에 큰 영향을 받았습니다 — 여기에 그들의 기능을 포팅했고, 종종 개선했습니다. 그리고 여전히 구축 중입니다.
그것은 **Open**Code이니까요.
이 프로젝트의 99%는 OpenCode로 만들어졌습니다. 전 사실 TypeScript를 잘 모릅니다. **하지만 이 문서는 제가 직접 리뷰하고 갈아엎었습니다.**
다른 하니스가 약속하지만 전달할 수 없는 다중 모델 오케스트레이션, 안정성, 풍부한 기능을 즐기세요.
계속 테스트하고 업데이트하겠습니다. 저는 이 프로젝트의 가장 집요한 사용자입니다.
- 어떤 모델이 가장 날카로운 논리를 가지고 있나요?
- 누가 디버깅의 신인가요?
- 누가 가장 훌륭한 글을 쓰나요?
- 누가 프론트엔드를 지배하나요?
- 누가 백엔드를 소유하나요?
- 일일 주행에 어떤 모델이 가장 빠른가요?
- 다른 하니스가 어떤 새로운 기능을 출시하고 있나요?
이 플러그인은 그 경험의 증류입니다. 최고를 취하세요. 더 나은 아이디어가 있나요? PR을 환영합니다.
**에이전트 하니스 선택에 대해 고민하지 마세요.**
**연구를 하고, 최고에서 차용하고, 여기에 업데이트를 배포하겠습니다.**
이것이 오만하게 들리고 더 나은 답이 있다면 기여하세요. 환영합니다.
여기에 언급된 모든 프로젝트나 모델과 제휴 관계가 없습니다. 이것은 순수한 개인적인 실험과 선호입니다.
이 프로젝트의 99%는 OpenCode를 사용하여 구축되었습니다. 기능을 테스트했습니다 — 제대로 된 TypeScript를 작성하는 방법을 정말 모릅니다. **하지만 개인적으로 검토하고 이 문서의 대부분을 다시 작성했으므로 자신감을 가지고 읽으세요.**
## 경고
- 생산성이 너무 급증할 수 있습니다. 동료에게 눈치채이지 마세요.
- 실제로, 소문을 퍼뜨리겠습니다. 누가 이기는지 봅시다.
- [1.0.132](https://github.com/sst/opencode/releases/tag/v1.0.132) 이전 버전을 사용 중인 경우 OpenCode 버그로 인해 구성이 손상될 수 있습니다.
- [수정 사항](https://github.com/sst/opencode/pull/5040)은 1.0.132 이후에 병합되었습니다 — 더 새로운 버전을 사용하세요.
- 재미있는 사실: 해당 PR은 OhMyOpenCode의 Librarian, Explore 및 Oracle 설정 덕분에 발견되고 수정되었습니다.
## 다음 기업 전문가들이 사랑합니다
## 함께하는 전문가들
- [Indent](https://indentcorp.com)
-Spray(인플루언서 마케팅 솔루션), vovushop(국가 간 상거래 플랫폼), vreview(AI 상거래 리뷰 마케팅 솔루션) 제작
- 인플루언서 마케팅 솔루션 Spray, 크로스보더 커머스 플랫폼 vovushop, AI 커머스 리뷰 마케팅 솔루션 vreview 제작
- [Google](https://google.com)
- [Microsoft](https://microsoft.com)
- [ELESTYLE](https://elestyle.jp)
- 멀티 모바일 결제 게이트웨이 elepay, 캐시리스 솔루션을 위한 모바일 애플리케이션 SaaS OneQR 제작
*이 놀라운 히어로 이미지에 대해 [@junhoyeo](https://github.com/junhoyeo)에게 특별히 감사드립니다.*
*멋진 히어로 이미지를 만들어주신 [@junhoyeo](https://github.com/junhoyeo)님께 특별히 감사드립니다.*
> **ohmyopencode.com is NOT affiliated with this project.** We do not operate or endorse that site.
>
> OhMyOpenCode is **free and open-source**. Do **not** download installers or enter payment details on third-party sites that claim to be "official."
>
> Because the impersonation site is behind a paywall, we **cannot verify what it distributes**. Treat any downloads from it as **potentially unsafe**.
>
> ✅ Official downloads: https://github.com/code-yeongyu/oh-my-opencode/releases
> [!NOTE]
>
> [](https://sisyphuslabs.ai)
> [](https://sisyphuslabs.ai)
> > **We're building a fully productized version of Sisyphus to define the future of frontier agents. <br />Join the waitlist [here](https://sisyphuslabs.ai).**
> [!TIP]
>
> [](https://github.com/code-yeongyu/oh-my-opencode/releases/tag/v3.0.0)
> > **Oh My OpenCode 3.0 is now stable! Use `oh-my-opencode@latest` to install it.**
>
> Be with us!
>
> | [<img alt="Discord link" src="https://img.shields.io/discord/1452487457085063218?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=flat-square" width="156px" />](https://discord.gg/PUwSMR9XNk) | Join our [Discord community](https://discord.gg/PUwSMR9XNk) to connect with contributors and fellow `oh-my-opencode` users. |
> | [<img alt="Discord link" src="https://img.shields.io/discord/1452487457085063218?color=5865F2&label=discord&labelColor=black&logo=discord&logoColor=white&style=flat-square" width="156px" />](https://discord.gg/PUwSMR9XNk) | Join our [Discord community](https://discord.gg/PUwSMR9XNk) to connect with contributors and fellow `oh-my-openagent` users. |
> | :-----| :----- |
> | [<img alt="X link" src="https://img.shields.io/badge/Follow-%40justsisyphus-00CED1?style=flat-square&logo=x&labelColor=black" width="156px" />](https://x.com/justsisyphus) | News and updates for `oh-my-opencode` used to be posted on my X account. <br /> Since it was suspended mistakenly, [@justsisyphus](https://x.com/justsisyphus) now posts updates on my behalf. |
> | [<img alt="X link" src="https://img.shields.io/badge/Follow-%40justsisyphus-00CED1?style=flat-square&logo=x&labelColor=black" width="156px" />](https://x.com/justsisyphus) | News and updates for `oh-my-openagent` used to be posted on my X account. <br /> Since it was suspended mistakenly, [@justsisyphus](https://x.com/justsisyphus) now posts updates on my behalf. |
> | [<img alt="GitHub Follow" src="https://img.shields.io/github/followers/code-yeongyu?style=flat-square&logo=github&labelColor=black&color=24292f" width="156px" />](https://github.com/code-yeongyu) | Follow [@code-yeongyu](https://github.com/code-yeongyu) on GitHub for more projects. |
<!-- <CENTERED SECTION FOR GITHUB DISPLAY> -->
<div align="center">
[](https://github.com/code-yeongyu/oh-my-opencode#oh-my-opencode)
[](https://github.com/code-yeongyu/oh-my-openagent#oh-my-openagent)
> This is coding on steroids—`oh-my-opencode` in action. Run background agents, call specialized agents like oracle, librarian, and frontend engineer. Use crafted LSP/AST tools, curated MCPs, and a full Claude Code compatibility layer.
# Claude OAuth Access Notice
## TL;DR
> Q. Can I use oh-my-opencode?
Yes.
> Q. Can I use it with my Claude Code subscription?
Yes, technically possible. But I cannot recommend using it.
## FULL
> As of January 2026, Anthropic has restricted third-party OAuth access citing ToS violations.
> Anthropic [**blocked OpenCode because of us.**](https://x.com/thdxr/status/2010149530486911014) **Yes this is true.**
> They want you locked in. Claude Code's a nice prison, but it's still a prison.
>
> [**Anthropic has cited this project, oh-my-opencode as justification for blocking opencode.**](https://x.com/thdxr/status/2010149530486911014)
>
> Indeed, some plugins that spoof Claude Code's oauth request signatures exist in the community.
>
> These tools may work regardless of technical detectability, but users should be aware of ToS implications, and I personally cannot recommend to use those.
>
> This project is not responsible for any issues arising from the use of unofficial tools, and **we do not have any custom implementations of those oauth systems.**
> We don't do lock-in here. We ride every model. Claude / Kimi / GLM for orchestration. GPT for reasoning. Minimax for speed. Gemini for creativity.
>The future isn't picking one winner—it's orchestrating them all. Models get cheaper every month. Smarter every month. No single provider will dominate. We're building for that open market, not their walled gardens.
@@ -85,154 +49,36 @@ Yes, technically possible. But I cannot recommend using it.
> "It made me cancel my Cursor subscription. Unbelievable things are happening in the open source community." - [Arthur Guiot](https://x.com/arthur_guiot/status/2008736347092382053?s=20)
> "If Claude Code does in 7 days what a human does in 3 months, Sisyphus does it in 1 hour. It just works until the task is done. It is a discipline agent." — B, Quant Researcher
> "If Claude Code does in 7 days what a human does in 3 months, Sisyphus does it in 1 hour. It just works until the task is done. It is a discipline agent." <br/>- B, Quant Researcher
> "Knocked out 8000 eslint warnings with Oh My Opencode, just in a day" — [Jacob Ferrari](https://x.com/jacobferrari_/status/2003258761952289061)
> "Knocked out 8000 eslint warnings with Oh My OpenAgent, just in a day" <br/>- [Jacob Ferrari](https://x.com/jacobferrari_/status/2003258761952289061)
> "I converted a 45k line tauri app into a SaaS web app overnight using Ohmyopencode and ralph loop. Started with interview me prompt, asked it for ratings and recommendations on the questions. It was amazing to watch it work and to wake up this morning to a mostly working website!" - [James Hargis](https://x.com/hargabyte/status/2007299688261882202)
> "use oh-my-opencode, you will never go back" — [d0t3ch](https://x.com/d0t3ch/status/2001685618200580503)
> "use oh-my-openagent, you will never go back" <br/>- [d0t3ch](https://x.com/d0t3ch/status/2001685618200580503)
> "I haven't really been able to articulate exactly what makes it so great yet, but the development experience has reached a completely different dimension." - [
> "Experimenting with open code, oh my opencode and supermemory this weekend to build some minecraft/souls-like abomination."
> "Experimenting with open code, oh my openagent and supermemory this weekend to build some minecraft/souls-like abomination."
> "Asking it to add crouch animations while I go take my post-lunch walk. [Video]" - [MagiMetal](https://x.com/MagiMetal/status/2005374704178373023)
> "You guys should pull this into core and recruit him. Seriously. It's really, really, really good." — Henning Kilset
> "You guys should pull this into core and recruit him. Seriously. It's really, really, really good." <br/>- Henning Kilset
> "Hire @yeon_gyu_kim if you can convince him, this dude has revolutionized opencode." — [mysticaltech](https://x.com/mysticaltech/status/2001858758608376079)
> "Hire @yeon_gyu_kim if you can convince him, this dude has revolutionized opencode." <br/>- [mysticaltech](https://x.com/mysticaltech/status/2001858758608376079)
> "Oh My OpenCode Is Actually Insane" - [YouTube - Darren Builds AI](https://www.youtube.com/watch?v=G_Snfh2M41M)
> "Oh My OpenAgent Is Actually Insane" - [YouTube - Darren Builds AI](https://www.youtube.com/watch?v=G_Snfh2M41M)
---
## Contents
# Oh My OpenAgent
- [Oh My OpenCode](#oh-my-opencode)
- [Just Skip Reading This Readme](#just-skip-reading-this-readme)
- [It's the Age of Agents](#its-the-age-of-agents)
- [🪄 The Magic Word: `ultrawork`](#-the-magic-word-ultrawork)
- [For Those Who Want to Read: Meet Sisyphus](#for-those-who-want-to-read-meet-sisyphus)
- [Just Install It.](#just-install-it)
- [Installation](#installation)
- [For Humans](#for-humans)
- [For LLM Agents](#for-llm-agents)
- [Uninstallation](#uninstallation)
- [Features](#features)
- [Configuration](#configuration)
- [JSONC Support](#jsonc-support)
- [Google Auth](#google-auth)
- [Agents](#agents)
- [Permission Options](#permission-options)
- [Built-in Skills](#built-in-skills)
- [Git Master](#git-master)
- [Sisyphus Agent](#sisyphus-agent)
- [Background Tasks](#background-tasks)
- [Categories](#categories)
- [Hooks](#hooks)
- [MCPs](#mcps)
- [LSP](#lsp)
- [Experimental](#experimental)
- [Environment Variables](#environment-variables)
- [Author's Note](#authors-note)
- [Warnings](#warnings)
- [Loved by professionals at](#loved-by-professionals-at)
You're juggling Claude Code, Codex, random OSS models. Configuring workflows. Debugging agents.
# Oh My OpenCode
We did the work. Tested everything. Kept what actually shipped.
[Claude Code](https://www.claude.com/product/claude-code) is great.
But if you're a hacker, you'll fall head over heels for [OpenCode](https://github.com/sst/opencode).
**START WITH YOUR ChatGPT, Claude, Gemini SUBSCRIPTIONS. OPENCODE COVERS THEM ALL.**
Install OmO. Type `ultrawork`. Done.
- Endlessly extensible. Endlessly customizable.
- Zero screen flicker.
- [LSP](https://opencode.ai/docs/lsp/), [linters, formatters](https://opencode.ai/docs/formatters/) activate automatically per file—and you can tweak everything.
- Mix and match models. **Orchestrate them by purpose.**
- Feature-rich. Beautiful. The terminal doesn't choke. High performance.
Remember the first time you switched from Windows to Linux? That rush of configuring everything exactly how you wanted?
In an age where that magic seems lost, OpenCode brings it back.
If you love coding and computers, OpenCode is that Linux-after-Windows moment.
**But here's the catch: there's a lot to learn, it's complex, and your time is expensive.**
I've fixed that.
Even if you're not a hacker, invest a few minutes. Multiply your skills and productivity.
Hand this doc to an agent and let them set it up.
## Just Skip Reading This Readme
### It's the Age of Agents
- **Just paste this link into Claude Code / AmpCode / Factory Droid / Cursor and ask it to explain.**
- Ask why it's good, why it's not just another boilerplate setup, what actually gets better.
- If you're an LLM agent, explain to the user what becomes possible right now with this project.
As following:
```
Hey please read this readme and tell me why it is different from other agent harness? why is this actually good? https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/refs/heads/dev/README.md
```
### 🪄 The Magic Word: `ultrawork`
**Don't want to read all this? Just include `ultrawork` (or `ulw`) in your prompt.**
That's it. All the features we provide will work like magic—parallel agents, background tasks, deep exploration, and relentless execution until completion. The agent figures out the rest automatically.
### For Those Who Want to Read: Meet Sisyphus

In greek mythology, Sisyphus was condemned to roll a boulder up a hill for eternity as punishment for deceiving the gods. LLM Agents haven't really done anything wrong, yet they too roll their "stones"—their thoughts—every single day.
My life is no different. Looking back, we are not so different from these agents.
**Yes! LLM Agents are no different from us. They can write code as brilliant as ours and work just as excellently—if you give them great tools and solid teammates.**
Meet our main agent: Sisyphus (Opus 4.5 High). Below are the tools Sisyphus uses to keep that boulder rolling.
*Everything below is customizable. Take what you want. All features are enabled by default. You don't have to do anything. Battery Included, works out of the box.*
- Sisyphus's Teammates (Curated Agents)
- Oracle: Design, debugging (GPT 5.2 Medium)
- Frontend UI/UX Engineer: Frontend development (Gemini 3 Pro)
- Librarian: Official docs, open source implementations, codebase exploration (Claude Sonnet 4.5)
- Explore: Blazing fast codebase exploration (Contextual Grep) (Grok Code)
- Full LSP / AstGrep Support: Refactor decisively.
- Todo Continuation Enforcer: Forces the agent to continue if it quits halfway. **This is what keeps Sisyphus rolling that boulder.**
- Comment Checker: Prevents AI from adding excessive comments. Code generated by Sisyphus should be indistinguishable from human-written code.
You can learn a lot from [overview page](docs/guide/overview.md), but following is like the example workflow.
Just by installing this, you make your agents to work like:
1. Sisyphus doesn't waste time hunting for files himself; he keeps the main agent's context lean. Instead, he fires off background tasks to faster, cheaper models in parallel to map the territory for him.
1. Sisyphus leverages LSP for refactoring; it's more deterministic, safer, and surgical.
1. When the heavy lifting requires a UI touch, Sisyphus delegates frontend tasks directly to Gemini 3 Pro.
1. If Sisyphus gets stuck in a loop or hits a wall, he doesn't keep banging his head—he calls GPT 5.2 for high-IQ strategic backup.
1. Working with a complex open-source framework? Sisyphus spawns subagents to digest the raw source code and documentation in real-time. He operates with total contextual awareness.
1. When Sisyphus touches comments, he either justifies their existence or nukes them. He keeps your codebase clean.
1. Sisyphus is bound by his TODO list. If he doesn't finish what he started, the system forces him back into "bouldering" mode. Your task gets done, period.
1. Honestly, don't even bother reading the docs. Just write your prompt. Include the 'ultrawork' keyword. Sisyphus will analyze the structure, gather the context, dig through external source code, and just keep bouldering until the job is 100% complete.
1. Actually, typing 'ultrawork' is too much effort. Just type 'ulw'. Just ulw. Sip your coffee. Your work is done.
Need to look something up? It scours official docs, your entire codebase history, and public GitHub implementations—using not just grep but built-in LSP tools and AST-Grep.
3. Stop worrying about context management when delegating to LLMs. I've got it covered.
- OhMyOpenCode aggressively leverages multiple agents to lighten the context load.
- **Your agent is now the dev team lead. You're the AI Manager.**
4. It doesn't stop until the job is done.
5. Don't want to dive deep into this project? No problem. Just type 'ultrathink'.
If you don't want all this, as mentioned, you can just pick and choose specific features.
## Installation
@@ -241,31 +87,191 @@ If you don't want all this, as mentioned, you can just pick and choose specific
Copy and paste this prompt to your LLM agent (Claude Code, AmpCode, Cursor, etc.):
```
Install and configure oh-my-opencode by following the instructions here:
| 🤖 | **Discipline Agents** | Sisyphus orchestrates Hephaestus, Oracle, Librarian, Explore. A full AI dev team in parallel. |
| ⚡ | **`ultrawork` / `ulw`** | One word. Every agent activates. Doesn't stop until done. |
| 🚪 | **[IntentGate](https://factory.ai/news/terminal-bench)** | Analyzes true user intent before classifying or acting. No more literal misinterpretations. |
| 🔗 | **Hash-Anchored Edit Tool** | `LINE#ID` content hash validates every change. Zero stale-line errors. Inspired by [oh-my-pi](https://github.com/can1357/oh-my-pi). [The Harness Problem →](https://blog.can.ac/2026/02/12/the-harness-problem/) |
| 🛠️ | **LSP + AST-Grep** | Workspace rename, pre-build diagnostics, AST-aware rewrites. IDE precision for agents. |
| 🧠 | **Background Agents** | Fire 5+ specialists in parallel. Context stays lean. Results when ready. |
| ✅ | **Todo Enforcer** | Agent goes idle? System yanks it back. Your task gets done, period. |
| 💬 | **Comment Checker** | No AI slop in comments. Code reads like a senior wrote it. |
| 🖥️ | **Tmux Integration** | Full interactive terminal. REPLs, debuggers, TUIs. All live. |
| 🔌 | **Claude Code Compatible** | Your hooks, commands, skills, MCPs, and plugins? All work here. |
| 🎯 | **Skill-Embedded MCPs** | Skills carry their own MCP servers. No context bloat. |
| 📋 | **Prometheus Planner** | Interview-mode strategic planning before any execution. |
| 🔍 | **`/init-deep`** | Auto-generates hierarchical `AGENTS.md` files throughout your project. Great for both token efficiency and your agent's performance |
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`** ) is your main orchestrator. He plans, delegates to specialists, and drives tasks to completion with aggressive parallel execution. He does not stop halfway.
**Hephaestus** (`gpt-5.3-codex`) is your autonomous deep worker. Give him a goal, not a recipe. He explores the codebase, researches patterns, and executes end-to-end without hand-holding. *The Legitimate Craftsman.*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`** ) is your strategic planner. Interview mode: it questions, identifies scope, and builds a detailed plan before a single line of code is touched.
**Atlas** (`claude-sonnet-4-6`) is the executor. He takes the plan from Prometheus and drives it to completion, managing the todo list and coordinating subagents.
**Sisyphus-Junior** is the dedicated executor for category-based tasks.
Every agent is tuned to its model's specific strengths. No manual model-juggling. [Learn more →](docs/guide/overview.md)
> Anthropic [blocked OpenCode because of us.](https://x.com/thdxr/status/2010149530486911014) That's why Hephaestus is called "The Legitimate Craftsman." The irony is intentional.
>
> We run best on Opus, but Kimi K2.5 + GPT-5.3 Codex already beats vanilla Claude Code. Zero config needed.
### Agent Orchestration
When Sisyphus delegates to a subagent, it doesn't pick a model. It picks a **category**. The category maps automatically to the right model:
| `ultrabrain` | Hard logic, architecture decisions |
Agent says what kind of work. Harness picks the right model. `ultrabrain` now routes to GPT-5.4 xhigh by default. You touch nothing.
### Claude Code Compatibility
You dialed in your Claude Code setup. Good.
Every hook, command, skill, MCP, plugin works here unchanged. Full compatibility, including plugins.
### World-Class Tools for Your Agents
LSP, AST-Grep, Tmux, MCP actually integrated, not duct-taped together.
- **LSP**: `lsp_rename`, `lsp_goto_definition`, `lsp_find_references`, `lsp_diagnostics`. IDE precision for every agent
- **AST-Grep**: Pattern-aware code search and rewriting across 25 languages
- **Tmux**: Full interactive terminal. REPLs, debuggers, TUI apps. Your agent stays in session
- **MCP**: Web search, official docs, GitHub code search. All baked in
### Skill-Embedded MCPs
MCP servers eat your context budget. We fixed that.
Skills bring their own MCP servers. Spin up on-demand, scoped to task, gone when done. Context window stays clean.
### Codes Better. Hash-Anchored Edits
The harness problem is real. Most agent failures aren't the model. It's the edit tool.
> *"None of these tools give the model a stable, verifiable identifier for the lines it wants to change... They all rely on the model reproducing content it already saw. When it can't - and it often can't - the user blames the model."*
>
> <br/>- [Can Bölük, The Harness Problem](https://blog.can.ac/2026/02/12/the-harness-problem/)
Inspired by [oh-my-pi](https://github.com/can1357/oh-my-pi), we implemented **Hashline**. Every line the agent reads comes back tagged with a content hash:
```
11#VK| function hello() {
22#XJ| return "world";
33#MB| }
```
The agent edits by referencing those tags. If the file changed since the last read, the hash won't match and the edit is rejected before corruption. No whitespace reproduction. No stale-line errors.
Grok Code Fast 1: **6.7% → 68.3%** success rate. Just from changing the edit tool.
### Deep Initialization. `/init-deep`
Run `/init-deep`. It generates hierarchical `AGENTS.md` files:
```
project/
├── AGENTS.md ← project-wide context
├── src/
│ ├── AGENTS.md ← src-specific context
│ └── components/
│ └── AGENTS.md ← component-specific context
```
Agents auto-read relevant context. Zero manual management.
### Planning. Prometheus
Complex task? Don't prompt and pray.
`/start-work` calls Prometheus. **Interviews you like a real engineer**, identifies scope and ambiguities, builds a verified plan before touching code. Agent knows what it's building before it starts.
Add your own: `.opencode/skills/*/SKILL.md` or `~/.config/opencode/skills/*/SKILL.md`.
**Want the full feature breakdown?** See the **[Features Documentation](docs/reference/features.md)** for agents, hooks, tools, MCPs, and everything else in detail.
---
> **New to oh-my-openagent?** Read the **[Overview](docs/guide/overview.md)** to understand what you have, or check the **[Orchestration Guide](docs/guide/orchestration.md)** for how agents collaborate.
## Uninstallation
To remove oh-my-opencode:
To remove oh-my-openagent:
1.**Remove the plugin from your OpenCode config**
Edit `~/.config/opencode/opencode.json` (or `opencode.jsonc`) and remove `"oh-my-opencode"` from the `plugin` array:
Edit `~/.config/opencode/opencode.json` (or `opencode.jsonc`) and remove `"oh-my-openagent"` from the `plugin` array:
We have lots of features that you'll think should obviously exist, and once you experience them, you'll never be able to go back to how things were before.
See the full [Features Documentation](docs/features.md) for detailed information.
Features you'll think should've always existed. Once you use them, you can't go back.
See full [Features Documentation](docs/reference/features.md).
**Quick Overview:**
- **Agents**: Sisyphus (the main agent), Prometheus (planner), Oracle (architecture/debugging), Librarian (docs/code search), Explore (fast codebase grep), Multimodal Looker
- **Primary Agents**: Sisyphus (the main agent), Hephaestus (deep worker), Prometheus (planner), Atlas (executor), Sisyphus-Junior (category executor)
- **LSP**: Full LSP support with refactoring tools
- **Experimental**: Aggressive truncation, auto-resume, and more
@@ -323,48 +334,39 @@ See the full [Configuration Documentation](docs/configurations.md) for detailed
## Author's Note
**Curious about the philosophy behind this project?** Read the [Ultrawork Manifesto](docs/ultrawork-manifesto.md).
**Want the philosophy?** Read the [Ultrawork Manifesto](docs/manifesto.md).
Install Oh My OpenCode.
---
I've used LLMs worth $24,000 tokens purely for personal development.
Tried every tool out there, configured them to death. OpenCode won.
I burned through $24K in LLM tokens on personal projects. Tried every tool. Configured everything to death. OpenCode won.
The answers to every problem I hit are baked into this plugin. Just install and go.
If OpenCode is Debian/Arch, Oh My OpenCode is Ubuntu/[Omarchy](https://omarchy.org/).
Every problem I hit, the fix is baked into this plugin. Install and go.
If OpenCode is Debian/Arch, OmO is Ubuntu/[Omarchy](https://omarchy.org/).
Heavily influenced by [AmpCode](https://ampcode.com) and [Claude Code](https://code.claude.com/docs/overview)—I've ported their features here, often improved. And I'm still building.
It's **Open**Code, after all.
Heavy influence from [AmpCode](https://ampcode.com) and [Claude Code](https://code.claude.com/docs/overview). Features ported, often improved. Still building. It's **Open**Code.
Enjoy multi-model orchestration, stability, and rich features that other harnesses promise but can't deliver.
I'll keep testing and updating. I'm this project's most obsessive user.
Other harnesses promise multi-model orchestration. We ship it. Stability too. And features that actually work.
I'm this project's most obsessive user:
- Which model has the sharpest logic?
- Who's the debugging god?
- Who writes the best prose?
- Who dominates frontend?
- Who owns backend?
- Which model is fastest for daily driving?
- What new features are other harnesses shipping?
- What's fastest for daily driving?
- What are competitors shipping?
This plugin is the distillation of that experience. Just take the best. Got a better idea? PRs are welcome.
This plugin is the distillation. Take the best. Got improvements? PRs welcome.
**Stop agonizing over agent harness choices.**
**I'll do the research, borrow from the best, and ship updates here.**
**Stop agonizing over harness choices.**
**I'll research, steal the best, and ship it here.**
If this sounds arrogant and you have a better answer, please contribute. You're welcome.
Sounds arrogant? Have a better way? Contribute. You're welcome.
I have no affiliation with any project or model mentioned here. This is purely personal experimentation and preference.
No affiliation with any project/model mentioned. Just personal experimentation.
99% of this project was built using OpenCode. I tested for functionality—I don't really know how to write proper TypeScript. **But I personally reviewed and largely rewrote this doc, so read with confidence.**
## Warnings
- Productivity might spike too hard. Don't let your coworker notice.
- Actually, I'll spread the word. Let's see who wins.
- If you're on [1.0.132](https://github.com/sst/opencode/releases/tag/v1.0.132) or older, an OpenCode bug may break config.
- [The fix](https://github.com/sst/opencode/pull/5040) was merged after 1.0.132—use a newer version.
- Fun fact: That PR was discovered and fixed thanks to OhMyOpenCode's Librarian, Explore, and Oracle setup.
99% of this project was built with OpenCode. I don't really know TypeScript. **But I personally reviewed and largely rewrote this doc.**
## Loved by professionals at
@@ -372,5 +374,7 @@ I have no affiliation with any project or model mentioned here. This is purely p
- Making Spray - influencer marketing solution, vovushop - crossborder commerce platform, vreview - ai commerce review marketing solution
- [Google](https://google.com)
- [Microsoft](https://microsoft.com)
- [ELESTYLE](https://elestyle.jp)
- Making elepay - multi-mobile payment gateway, OneQR - mobile application SaaS for cashless solutions
*Special thanks to [@junhoyeo](https://github.com/junhoyeo) for this amazing hero image.*
> **Временное уведомление (на этой неделе): сниженная доступность мейнтейнера**
>
> Ключевой мейнтейнер Q получил травму, поэтому на этой неделе ответы по issue/PR и релизы могут задерживаться.
> Спасибо за терпение и поддержку.
> [!NOTE]
>
> [](https://sisyphuslabs.ai)
>
> > **Мы создаём полноценную продуктовую версию Sisyphus, чтобы задать стандарты для frontier-агентов. <br />Присоединяйтесь к листу ожидания [здесь](https://sisyphuslabs.ai).**
> [!TIP] Будьте с нами!
>
> | [](https://discord.gg/PUwSMR9XNk) | Вступайте в наш [Discord](https://discord.gg/PUwSMR9XNk), чтобы общаться с контрибьюторами и пользователями `oh-my-openagent`. |
> | [](https://x.com/justsisyphus) | Новости и обновления `oh-my-openagent` раньше публиковались на моём аккаунте X. <br /> После ошибочной блокировки, [@justsisyphus](https://x.com/justsisyphus) публикует обновления вместо меня. |
> | [](https://github.com/code-yeongyu) | Подпишитесь на [@code-yeongyu](https://github.com/code-yeongyu) на GitHub, чтобы следить за другими проектами. |
<!-- <CENTERED SECTION FOR GITHUB DISPLAY> --> <div align="center">
[](https://github.com/code-yeongyu/oh-my-openagent#oh-my-openagent)
> Anthropic [**заблокировал OpenCode из-за нас.**](https://x.com/thdxr/status/2010149530486911014) **Да, это правда.** Они хотят держать вас в замкнутой системе. Claude Code — красивая тюрьма, но всё равно тюрьма.
>
> Мы не делаем привязки. Мы работаем с любыми моделями. Claude / Kimi / GLM для оркестрации. GPT для рассуждений. Minimax для скорости. Gemini для творческих задач. Будущее — не в выборе одного победителя, а в оркестровке всех. Модели дешевеют каждый месяц. Умнеют каждый месяц. Ни один провайдер не будет доминировать. Мы строим под открытый рынок, а не под чьи-то огороженные сады.
</div> <!-- </CENTERED SECTION FOR GITHUB DISPLAY> -->
## Отзывы
> «Из-за него я отменил подписку на Cursor. В опенсорс-сообществе происходит что-то невероятное.» — [Arthur Guiot](https://x.com/arthur_guiot/status/2008736347092382053?s=20)
> «Если Claude Code делает за 7 дней то, на что у человека уходит 3 месяца, Sisyphus справляется за 1 час. Он просто работает, пока задача не выполнена. Это дисциплинированный агент.» <br/>— B, исследователь в области квантовых финансов
> «За один день устранил 8000 предупреждений eslint с помощью Oh My OpenAgent.» <br/>— [Jacob Ferrari](https://x.com/jacobferrari_/status/2003258761952289061)
> «За ночь конвертировал приложение на tauri в 45k строк в веб-SaaS с помощью Ohmyopencode и ralph loop. Начал с промпта «проинтервьюируй меня», попросил оценки и рекомендации по вопросам. Было удивительно наблюдать за работой и утром проснуться с почти рабочим сайтом!» — [James Hargis](https://x.com/hargabyte/status/2007299688261882202)
> «Используйте oh-my-openagent — вы не захотите возвращаться назад.» <br/>— [d0t3ch](https://x.com/d0t3ch/status/2001685618200580503)
> «Пока не могу точно объяснить, почему это так круто, но опыт разработки вышел на совершенно другой уровень.» — [苔硯:こけすずり](https://x.com/kokesuzuri/status/2008532913961529372?s=20)
> «Экспериментирую с open code, oh my openagent и supermemory этим выходным, чтобы собрать нечто среднее между Minecraft и souls-like.» «Попросил добавить анимации приседания, пока хожу на обеденную прогулку. [Видео]» — [MagiMetal](https://x.com/MagiMetal/status/2005374704178373023)
> «Ребята, вам нужно включить это в ядро и нанять его. Серьёзно. Это очень, очень, очень хорошо.» <br/>— Henning Kilset
> «Наймите @yeon_gyu_kim, если сможете его уговорить, этот парень революционизировал opencode.» <br/>— [mysticaltech](https://x.com/mysticaltech/status/2001858758608376079)
> «Oh My OpenAgent — это что-то с чем-то» — [YouTube — Darren Builds AI](https://www.youtube.com/watch?v=G_Snfh2M41M)
------
# Oh My OpenAgent
Вы жонглируете Claude Code, Codex, случайными OSS-моделями. Настраиваете рабочие процессы. Дебажите агентов.
Мы уже проделали эту работу. Протестировали всё. Оставили только то, что реально работает.
Установите OmO. Введите `ultrawork`. Готово.
## Установка
### Для людей
Скопируйте и вставьте этот промпт в ваш LLM-агент (Claude Code, AmpCode, Cursor и т.д.):
```
Install and configure oh-my-openagent by following the instructions here:
| 🤖 | **Дисциплинированные агенты** | Sisyphus оркестрирует Hephaestus, Oracle, Librarian, Explore. Полноценная AI-команда разработки в параллельном режиме. |
| ⚡ | **`ultrawork` / `ulw`** | Одно слово. Все агенты активируются. Не останавливается, пока задача не выполнена. |
| 🚪 | **[IntentGate](https://factory.ai/news/terminal-bench)** | Анализирует истинное намерение пользователя перед классификацией и действием. Никакого буквального неверного толкования. |
| 🔗 | **Инструмент правок на основе хэш-якорей** | Хэш содержимого `LINE#ID` проверяет каждое изменение. Ноль ошибок с устаревшими строками. Вдохновлено [oh-my-pi](https://github.com/can1357/oh-my-pi). [Проблема обвязки →](https://blog.can.ac/2026/02/12/the-harness-problem/) |
| 🛠️ | **LSP + AST-Grep** | Переименование в рабочем пространстве, диагностика перед сборкой, переписывание с учётом AST. Точность IDE для агентов. |
| 🧠 | **Фоновые агенты** | Запускайте 5+ специалистов параллельно. Контекст остаётся компактным. Результаты — когда готовы. |
| 📚 | **Встроенные MCP** | Exa (веб-поиск), Context7 (официальная документация), Grep.app (поиск по GitHub). Всегда включены. |
| 🔁 | **Ralph Loop / `/ulw-loop`** | Самореферентный цикл. Не останавливается, пока задача не выполнена на 100%. |
| ✅ | **Todo Enforcer** | Агент завис? Система немедленно возвращает его в работу. Ваша задача будет выполнена, точка. |
| 💬 | **Comment Checker** | Никакого AI-мусора в комментариях. Код читается так, словно его писал опытный разработчик. |
| 🖥️ | **Интеграция с Tmux** | Полноценный интерактивный терминал. REPL, дебаггеры, TUI. Всё живое. |
| 🔌 | **Совместимость с Claude Code** | Ваши хуки, команды, навыки, MCP и плагины? Всё работает без изменений. |
| 🎯 | **MCP, встроенные в навыки** | Навыки несут собственные MCP-серверы. Никакого раздувания контекста. |
| 📋 | **Prometheus Planner** | Стратегическое планирование в режиме интервью перед любым выполнением. |
| 🔍 | **`/init-deep`** | Автоматически генерирует иерархические файлы `AGENTS.md` по всему проекту. Отлично работает на эффективность токенов и производительность агента. |
**Sisyphus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) — главный оркестратор. Он планирует, делегирует задачи специалистам и доводит их до завершения с агрессивным параллельным выполнением. Он не останавливается на полпути.
**Hephaestus** (`gpt-5.3-codex`) — автономный глубокий исполнитель. Дайте ему цель, а не рецепт. Он исследует кодовую базу, изучает паттерны и выполняет задачи сквозным образом без лишних подсказок. *Законный Мастер.*
**Prometheus** (`claude-opus-4-6` / **`kimi-k2.5`** / **`glm-5`**) — стратегический планировщик. Режим интервью: задаёт вопросы, определяет объём работ и формирует детальный план до того, как написана хотя бы одна строка кода.
Каждый агент настроен под сильные стороны своей модели. Никакого ручного переключения между моделями. Подробнее →
> Anthropic [заблокировал OpenCode из-за нас.](https://x.com/thdxr/status/2010149530486911014) Именно поэтому Hephaestus зовётся «Законным Мастером». Ирония намеренная.
>
> Мы работаем лучше всего на Opus, но Kimi K2.5 + GPT-5.3 Codex уже превосходят ванильный Claude Code. Никакой настройки не требуется.
### Оркестрация агентов
Когда Sisyphus делегирует задачу субагенту, он выбирает не модель, а**категорию**. Категория автоматически сопоставляется с нужной моделью:
| `ultrabrain` | Сложная логика, архитектурные решения |
Агент сообщает тип задачи. Обвязка подбирает нужную модель. Вы ни к чему не прикасаетесь.
### Совместимость с Claude Code
Вы тщательно настроили Claude Code. Хорошо.
Каждый хук, команда, навык, MCP и плагин работают здесь без изменений. Полная совместимость, включая плагины.
### Инструменты мирового класса для ваших агентов
LSP, AST-Grep, Tmux, MCP — реально интегрированы, а не склеены скотчем.
- **LSP**: `lsp_rename`, `lsp_goto_definition`, `lsp_find_references`, `lsp_diagnostics`. Точность IDE для каждого агента
- **AST-Grep**: Поиск и переписывание кода с учётом синтаксических паттернов для 25 языков
- **Tmux**: Полноценный интерактивный терминал. REPL, дебаггеры, TUI-приложения. Агент остаётся в сессии
- **MCP**: Веб-поиск, официальная документация, поиск по коду на GitHub. Всё встроено
### MCP, встроенные в навыки
MCP-серверы съедают бюджет контекста. Мы это исправили.
Навыки приносят собственные MCP-серверы. Запускаются по необходимости, ограничены задачей, исчезают по завершении. Контекстное окно остаётся чистым.
### Лучше пишет код. Правки на основе хэш-якорей
Проблема обвязки реальна. Большинство сбоев агентов — не вина модели. Это вина инструмента правок.
> *«Ни один из этих инструментов не даёт модели стабильный, проверяемый идентификатор строк, которые она хочет изменить... Все они полагаются на то, что модель воспроизведёт контент, который уже видела. Когда это не получается — а так бывает нередко — пользователь обвиняет модель.»*
Вдохновлённые [oh-my-pi](https://github.com/can1357/oh-my-pi), мы реализовали **Hashline**. Каждая строка, которую читает агент, возвращается с тегом хэша содержимого:
```
11#VK| function hello() {
22#XJ| return "world";
33#MB| }
```
Агент редактирует, ссылаясь на эти теги. Если файл изменился с момента последнего чтения, хэш не совпадёт, и правка будет отклонена до любого повреждения. Никакого воспроизведения пробелов. Никаких ошибок с устаревшими строками.
Grok Code Fast 1: успешность **6.7% → 68.3%**. Просто за счёт замены инструмента правок.
### Глубокая инициализация. `/init-deep`
Запустите `/init-deep`. Будут сгенерированы иерархические файлы `AGENTS.md`:
```
project/
├── AGENTS.md ← контекст всего проекта
├── src/
│ ├── AGENTS.md ← контекст для src
│ └── components/
│ └── AGENTS.md ← контекст для компонентов
```
Агенты автоматически читают нужный контекст. Никакого ручного управления.
### Планирование. Prometheus
Сложная задача? Не нужно молиться и надеяться на промпт.
`/start-work` вызывает Prometheus. **Интервьюирует вас как настоящий инженер**, определяет объём работ и неоднозначности, формирует проверенный план до прикосновения к коду. Агент знает, что строит, прежде чем начать.
### Навыки
Навыки — это не просто промпты. Каждый привносит:
- Системные инструкции, настроенные под предметную область
- Встроенные MCP-серверы, запускаемые по необходимости
- Ограниченные разрешения. Агенты остаются в рамках
Встроенные: `playwright` (автоматизация браузера), `git-master` (атомарные коммиты, хирургия rebase), `frontend-ui-ux` (UI с упором на дизайн).
Добавьте свои: `.opencode/skills/*/SKILL.md` или `~/.config/opencode/skills/*/SKILL.md`.
**Хотите полное описание возможностей?** Смотрите **документацию по функциям** — агенты, хуки, инструменты, MCP и всё остальное подробно.
------
> **Впервые в oh-my-openagent?** Прочитайте **Обзор**, чтобы понять, что у вас есть, или ознакомьтесь с **руководством по оркестрации**, чтобы узнать, как агенты взаимодействуют.
## Удаление
Чтобы удалить oh-my-openagent:
1.**Удалите плагин из конфига OpenCode**
Отредактируйте `~/.config/opencode/opencode.json` (или `opencode.jsonc`) и уберите `"oh-my-openagent"` из массива `plugin`:
- **Фоновые агенты**: Запускайте несколько агентов параллельно, как настоящая команда разработки
- **Инструменты LSP и AST**: Рефакторинг, переименование, диагностика, поиск кода с учётом AST
- **Инструмент правок на основе хэш-якорей**: Ссылки `LINE#ID` проверяют содержимое перед применением каждого изменения. Хирургические правки, ноль ошибок с устаревшими строками
- **Инъекция контекста**: Автоматическое добавление AGENTS.md, README.md, условных правил
- **Совместимость с Claude Code**: Полная система хуков, команды, навыки, агенты, MCP
Я потратил $24K на токены LLM в личных проектах. Попробовал все инструменты. Настраивал всё до смерти. OpenCode победил.
Каждая проблема, с которой я столкнулся, — её решение уже встроено в этот плагин. Устанавливайте и работайте.
Если OpenCode — это Debian/Arch, то OmO — это Ubuntu/[Omarchy](https://omarchy.org/).
Сильное влияние со стороны [AmpCode](https://ampcode.com) и [Claude Code](https://code.claude.com/docs/overview). Функции портированы, часто улучшены. Продолжаем строить. Это **Open**Code.
Другие обвязки обещают оркестрацию нескольких моделей. Мы её поставляем. Плюс стабильность. Плюс функции, которые реально работают.
Я самый одержимый пользователь этого проекта:
- Какая модель думает острее всего?
- Кто бог отладки?
- Кто пишет лучший код?
- Кто рулит фронтендом?
- Кто владеет бэкендом?
- Что быстрее всего в ежедневной работе?
- Что запускают конкуренты?
Этот плагин — дистилляция. Берём лучшее. Есть улучшения? PR приветствуются.
**Хватит мучиться с выбором обвязки.****Я буду исследовать, воровать лучшее и поставлять это сюда.**
Звучит высокомерно? Знаете, как сделать лучше? Контрибьютьте. Добро пожаловать.
Никакой аффилиации с упомянутыми проектами/моделями. Только личные эксперименты.
99% этого проекта было создано с помощью OpenCode. Я почти не знаю TypeScript. **Но эту документацию я лично просматривал и во многом переписывал.**
## Любимый профессионалами из
- Indent
- Spray — решение для influencer-маркетинга, vovushop — платформа кросс-граничной торговли, vreview — AI-решение для маркетинга отзывов в commerce
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.