Files
oh-my-openagent/docs/guide/agent-model-matching.md
YeonGyu-Kim 4a029258a4 fix: resolve 5 remaining pre-publish blockers (14, 15, 17, 21, 25c)
- completion-promise-detector: restrict to assistant text parts only,
  remove tool_result from completion detection (blocker 14)
- ralph-loop tests: flip tool_result completion expectations to negative
  coverage, add false-positive rejection tests (blocker 15)
- skill tools: merge nativeSkills into initial cachedDescription
  synchronously before any execute() call (blocker 17)
- skill tools test: add assertion for initial description including
  native skills before execute() (blocker 25c)
- docs: sync all 4 fallback-chain docs with model-requirements.ts
  runtime source of truth (blocker 21)

Verified: bun test (4599 pass / 0 fail), tsc --noEmit clean
2026-03-28 15:57:27 +09:00

307 lines
20 KiB
Markdown

# Agent-Model Matching Guide
> **For agents and users**: Why each agent needs a specific model — and how to customize without breaking things.
## The Core Insight: Models Are Developers
Think of AI models as developers on a team. Each has a different brain, different personality, different strengths. **A model isn't just "smarter" or "dumber." It thinks differently.** Give the same instruction to Claude and GPT, and they'll interpret it in fundamentally different ways.
This isn't a bug. It's the foundation of the entire system.
Oh My OpenAgent assigns each agent a model that matches its _working style_ — like building a team where each person is in the role that fits their personality.
### Sisyphus: The Sociable Lead
Sisyphus is the developer who knows everyone, goes everywhere, and gets things done through communication and coordination. Talks to other agents, understands context across the whole codebase, delegates work intelligently, and codes well too. But deep, purely technical problems? He'll struggle a bit.
**This is why Sisyphus uses Claude / Kimi / GLM.** These models excel at:
- Following complex, multi-step instructions (Sisyphus's prompt is ~1,100 lines)
- Maintaining conversation flow across many tool calls
- Understanding nuanced delegation and orchestration patterns
- Producing well-structured, communicative output
Using Sisyphus with older GPT models would be like taking your best project manager — the one who coordinates everyone, runs standups, and keeps the whole team aligned — and sticking them in a room alone to debug a race condition. Wrong fit. GPT-5.4 now has a dedicated Sisyphus prompt path, but GPT is still not the default recommendation for the orchestrator.
### Hephaestus: The Deep Specialist
Hephaestus is the developer who stays in their room coding all day. Doesn't talk much. Might seem socially awkward. But give them a hard technical problem and they'll emerge three hours later with a solution nobody else could have found.
**This is why Hephaestus uses GPT-5.4.** GPT-5.4 is built for exactly this:
- Deep, autonomous exploration without hand-holding
- Multi-file reasoning across complex codebases
- Principle-driven execution (give a goal, not a recipe)
- Working independently for extended periods
Using Hephaestus with GLM or Kimi would be like assigning your most communicative, sociable developer to sit alone and do nothing but deep technical work. They'd get it done eventually, but they wouldn't shine — you'd be wasting exactly the skills that make them valuable.
### The Takeaway
Every agent's prompt is tuned to match its model's personality. **When you change the model, you change the brain — and the same instructions get understood completely differently.** Model matching isn't about "better" or "worse." It's about fit.
---
## How Claude and GPT Think Differently
This matters for understanding why some agents support both model families while others don't.
**Claude** responds to **mechanics-driven** prompts — detailed checklists, templates, step-by-step procedures. More rules = more compliance. You can write a 1,100-line prompt with nested workflows and Claude will follow every step.
**GPT** (especially 5.2+) responds to **principle-driven** prompts — concise principles, XML structure, explicit decision criteria. More rules = more contradiction surface = more drift. GPT works best when you state the goal and let it figure out the mechanics.
Real example: Prometheus's Claude prompt is ~1,100 lines across 7 files. The GPT prompt achieves the same behavior with 3 principles in ~121 lines. Same outcome, completely different approach.
Agents that support both families (Prometheus, Atlas) auto-detect your model at runtime and switch prompts via `isGptModel()`. You don't have to think about it.
---
## Agent Profiles
### Communicators → Claude / Kimi / GLM
These agents have Claude-optimized prompts — long, detailed, mechanics-driven. They need models that reliably follow complex, multi-layered instructions.
| Agent | Role | Fallback Chain | Notes |
| ------------ | ----------------- | -------------------------------------- | ------------------------------------------------------------------------------------------------- |
| **Sisyphus** | Main orchestrator | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/kimi-k2.5 → kimi-for-coding/k2p5 → opencode\|moonshotai\|moonshotai-cn\|firmware\|ollama-cloud\|aihubmix/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → zai-coding-plan\|opencode/glm-5 → opencode/big-pickle | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Metis** | Plan gap analyzer | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → kimi-for-coding/k2p5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Dual-Prompt Agents → Claude preferred, GPT supported
These agents ship separate prompts for Claude and GPT families. They auto-detect your model and switch at runtime.
| Agent | Role | Fallback Chain | Notes |
| -------------- | ----------------- | -------------------------------------- | -------------------------------------------------------------------- |
| **Prometheus** | Strategic planner | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → opencode-go/glm-5 → google\|github-copilot\|opencode/gemini-3.1-pro | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Atlas** | Todo orchestrator | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → opencode-go/minimax-m2.7 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Deep Specialists → GPT
These agents are built for GPT's principle-driven style. Their prompts assume autonomous, goal-oriented execution. Don't override to Claude.
| Agent | Role | Fallback Chain | Notes |
| -------------- | ----------------------- | -------------------------------------- | ------------------------------------------------ |
| **Hephaestus** | Autonomous deep worker | GPT-5.4 (medium) | Requires a GPT-capable provider. The craftsman. |
| **Oracle** | Architecture consultant | openai\|github-copilot\|opencode/gpt-5.4 (high) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Momus** | Ruthless reviewer | openai\|github-copilot\|opencode/gpt-5.4 (xhigh) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → opencode-go/glm-5 | Exact runtime chain from `src/shared/model-requirements.ts`. |
### Utility Runners → Speed over Intelligence
These agents do grep, search, and retrieval. They intentionally use the fastest, cheapest models available. **Don't "upgrade" them to Opus** — that's hiring a senior engineer to file paperwork.
| Agent | Role | Fallback Chain | Notes |
| --------------------- | ------------------ | ---------------------------------------------- | ----------------------------------------------------- |
| **Explore** | Fast codebase grep | github-copilot\|xai/grok-code-fast-1 → opencode-go/minimax-m2.7-highspeed → opencode/minimax-m2.7 → anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Librarian** | Docs/code search | opencode-go/minimax-m2.7 → opencode/minimax-m2.7-highspeed → anthropic\|opencode/claude-haiku-4-5 → opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Multimodal Looker** | Vision/screenshots | openai\|opencode/gpt-5.4 (medium) → opencode-go/kimi-k2.5 → zai-coding-plan/glm-4.6v → openai\|github-copilot\|opencode/gpt-5-nano | Exact runtime chain from `src/shared/model-requirements.ts`. |
| **Sisyphus-Junior** | Category executor | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/kimi-k2.5 → openai\|github-copilot\|opencode/gpt-5.4 (medium) → opencode-go/minimax-m2.7 → opencode/big-pickle | Exact runtime chain from `src/shared/model-requirements.ts`. |
---
## Model Families
### Claude Family
Communicative, instruction-following, structured output. Best for agents that need to follow complex multi-step prompts.
| Model | Strengths |
| --------------------- | ---------------------------------------------------------------------------- |
| **Claude Opus 4.6** | Best overall. Highest compliance with complex prompts. Default for Sisyphus. |
| **Claude Sonnet 4.6** | Faster, cheaper. Good balance for everyday tasks. |
| **Claude Haiku 4.5** | Fast and cheap. Good for quick tasks and utility work. |
| **Kimi K2.5** | Behaves very similarly to Claude. Great all-rounder at lower cost. |
| **GLM 5** | Claude-like behavior. Solid for orchestration tasks. |
### GPT Family
Principle-driven, explicit reasoning, deep technical capability. Best for agents that work autonomously on complex problems.
| Model | Strengths |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| **GPT-5.3 Codex** | Deep coding powerhouse. Autonomous exploration. Still available for deep category and explicit overrides. |
| **GPT-5.4** | High intelligence, strategic reasoning. Default for Oracle, Momus, and a key fallback for Prometheus / Atlas. Uses xhigh variant for Momus. |
| **GPT-5.4 Mini** | Fast + strong reasoning. Good for lightweight autonomous tasks. Default for quick category. |
| **GPT-5-Nano** | Ultra-cheap, fast. Good for simple utility tasks. |
### Other Models
| Model | Strengths |
| -------------------- | ------------------------------------------------------------------------------------------------------------ |
| **Gemini 3.1 Pro** | Excels at visual/frontend tasks. Different reasoning style. Default for `visual-engineering` and `artistry`. |
| **Gemini 3 Flash** | Fast. Good for doc search and light tasks. |
| **Grok Code Fast 1** | Blazing fast code grep. Default for Explore agent. |
| **MiniMax M2.7** | Fast and smart. Used in OpenCode Go and OpenCode Zen utility fallback chains. |
| **MiniMax M2.7 Highspeed** | High-speed OpenCode catalog entry used in utility fallback chains that prefer the fastest available MiniMax path. |
### OpenCode Go
A premium subscription tier ($10/month) that provides reliable access to Chinese frontier models through OpenCode's infrastructure.
**Available Models:**
| Model | Use Case |
| ------------------------ | --------------------------------------------------------------------- |
| **opencode-go/kimi-k2.5** | Vision-capable, Claude-like reasoning. Used by Sisyphus, Atlas, Sisyphus-Junior, Multimodal Looker. |
| **opencode-go/glm-5** | Text-only orchestration model. Used by Oracle, Prometheus, Metis, Momus. |
| **opencode-go/minimax-m2.7** | Ultra-cheap, fast responses. Used by Librarian, Atlas, and Sisyphus-Junior for utility work. |
| **opencode-go/minimax-m2.7-highspeed** | Even faster OpenCode Go MiniMax entry used by Explore when the high-speed catalog entry is available. |
**When It Gets Used:**
OpenCode Go models appear throughout the fallback chains as intermediate options. Depending on the agent, they can sit before GPT, after GPT, or act as the last structured-model fallback before cheaper utility paths.
**Go-Only Scenarios:**
Some model identifiers like `k2p5` (paid Kimi K2.5) and `glm-5` may only be available through OpenCode Go subscription in certain regions. When configured with these short identifiers, the system resolves them through the opencode-go provider first.
### About Free-Tier Fallbacks
You may see model names like `kimi-k2.5-free`, `minimax-m2.7`, `minimax-m2.7-highspeed`, or `big-pickle` (GLM 4.6) in the source code or logs. These are provider-specific or speed-optimized entries in fallback chains.
You don't need to configure them. The system includes them so it degrades gracefully when you don't have every paid subscription. If you have the paid version, the paid version is always preferred.
---
## Task Categories
When agents delegate work, they don't pick a model name — they pick a **category**. The category maps to the right model automatically.
| Category | When Used | Fallback Chain |
| -------------------- | -------------------------- | -------------------------------------------- |
| `visual-engineering` | Frontend, UI, CSS, design | google\|github-copilot\|opencode/gemini-3.1-pro (high) → zai-coding-plan\|opencode/glm-5 → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 → kimi-for-coding/k2p5 |
| `ultrabrain` | Maximum reasoning needed | openai\|opencode/gpt-5.4 (xhigh) → google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → opencode-go/glm-5 |
| `deep` | Deep coding, complex logic | openai\|opencode/gpt-5.3-codex (medium) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → google\|github-copilot\|opencode/gemini-3.1-pro (high) |
| `artistry` | Creative, novel approaches | google\|github-copilot\|opencode/gemini-3.1-pro (high) → anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 |
| `quick` | Simple, fast tasks | openai\|github-copilot\|opencode/gpt-5.4-mini → anthropic\|github-copilot\|opencode/claude-haiku-4-5 → google\|github-copilot\|opencode/gemini-3-flash → opencode-go/minimax-m2.7 → opencode/gpt-5-nano |
| `unspecified-high` | General complex work | anthropic\|github-copilot\|opencode/claude-opus-4-6 (max) → openai\|github-copilot\|opencode/gpt-5.4 (high) → zai-coding-plan\|opencode/glm-5 → kimi-for-coding/k2p5 → opencode-go/glm-5 → opencode/kimi-k2.5 → opencode\|moonshotai\|moonshotai-cn\|firmware\|ollama-cloud\|aihubmix/kimi-k2.5 |
| `unspecified-low` | General standard work | anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → openai\|opencode/gpt-5.3-codex (medium) → opencode-go/kimi-k2.5 → google\|github-copilot\|opencode/gemini-3-flash → opencode-go/minimax-m2.7 |
| `writing` | Text, docs, prose | google\|github-copilot\|opencode/gemini-3-flash → opencode-go/kimi-k2.5 → anthropic\|github-copilot\|opencode/claude-sonnet-4-6 → opencode-go/minimax-m2.7 |
See the [Orchestration System Guide](./orchestration.md) for how agents dispatch tasks to categories.
---
## Customization
### Example Configuration
```jsonc
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/dev/assets/oh-my-opencode.schema.json",
"agents": {
// Main orchestrator: Claude Opus or Kimi K2.5 work best
"sisyphus": {
"model": "kimi-for-coding/k2p5",
"ultrawork": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
},
// Research agents: cheaper models are fine
"librarian": { "model": "google/gemini-3-flash" },
"explore": { "model": "github-copilot/grok-code-fast-1" },
// Architecture consultation: GPT or Claude Opus
"oracle": { "model": "openai/gpt-5.4", "variant": "high" },
// Prometheus inherits sisyphus model; just add prompt guidance
"prometheus": {
"prompt_append": "Leverage deep & quick agents heavily, always in parallel.",
},
},
"categories": {
"quick": { "model": "opencode/gpt-5-nano" },
"unspecified-low": { "model": "anthropic/claude-sonnet-4-6" },
"unspecified-high": { "model": "anthropic/claude-opus-4-6", "variant": "max" },
"visual-engineering": {
"model": "google/gemini-3.1-pro",
"variant": "high",
},
"writing": { "model": "google/gemini-3-flash" },
},
// Limit expensive providers; let cheap ones run freely
"background_task": {
"providerConcurrency": {
"anthropic": 3,
"openai": 3,
"opencode": 10,
"zai-coding-plan": 10,
},
"modelConcurrency": {
"anthropic/claude-opus-4-6": 2,
"opencode/gpt-5-nano": 20,
},
},
}
```
Run `opencode models` to see available models, `opencode auth login` to authenticate providers.
### Safe vs Dangerous Overrides
**Safe** — same personality type:
- Sisyphus: Opus → Sonnet, Kimi K2.5, GLM 5 (all communicative models)
- Prometheus: Opus → GPT-5.4 (auto-switches to the GPT prompt)
- Atlas: Claude Sonnet 4.6 → GPT-5.4 (auto-switches to the GPT prompt)
**Dangerous** — personality mismatch:
- Sisyphus → older GPT models: **Still a bad fit. GPT-5.4 is the only dedicated GPT prompt path.**
- Hephaestus → Claude: **Built for Codex's autonomous style. Claude can't replicate this.**
- Explore → Opus: **Massive cost waste. Explore needs speed, not intelligence.**
- Librarian → Opus: **Same. Doc search doesn't need Opus-level reasoning.**
### How Model Resolution Works
Each agent has a fallback chain. The system tries models in priority order until it finds one available through your connected providers. You don't need to configure providers per model. Just authenticate (`opencode auth login`) and the system figures out which models are available and where.
Core-agent tab cycling is deterministic via injected runtime order field. The fixed priority order is Sisyphus (order: 1), Hephaestus (order: 2), Prometheus (order: 3), and Atlas (order: 4), then the remaining agents follow.
Your explicit configuration always wins. If you set a specific model for an agent, that choice takes precedence even when resolution data is cold.
Variant and `reasoningEffort` overrides are normalized to model-supported values, so cross-provider overrides degrade gracefully instead of failing hard.
Model capabilities are models.dev-backed, with a refreshable cache and capability diagnostics. Use `bunx oh-my-opencode refresh-model-capabilities` to update the cache, or configure `model_capabilities.auto_refresh_on_start` to refresh at startup.
To see which models your agents will actually use, run `bunx oh-my-opencode doctor`. This shows effective model resolution based on your current authentication and config.
```
Agent Request → User Override (if configured) → Fallback Chain → System Default
```
### File-Based Prompts
You can load agent system prompts from external files using `file://` URLs in the `prompt` field, or append additional content with `prompt_append`. The `prompt_append` field also works on categories.
```jsonc
{
"agents": {
"sisyphus": {
"prompt": "file:///path/to/custom-prompt.md"
},
"oracle": {
"prompt_append": "file:///path/to/additional-context.md"
}
},
"categories": {
"deep": {
"prompt_append": "file:///path/to/deep-category-append.md"
}
}
}
```
The file content is loaded at runtime and injected into the agent's system prompt. Supports `~` expansion for home directory and relative `file://` paths.
---
## See Also
- [Installation Guide](./installation.md) — Setup and authentication
- [Orchestration System Guide](./orchestration.md) — How agents dispatch tasks to categories
- [Configuration Reference](../reference/configuration.md) — Full config options
- [`src/shared/model-requirements.ts`](../../src/shared/model-requirements.ts) — Source of truth for fallback chains