Files
oh-my-openagent/docs/guide/agent-model-matching.md
YeonGyu-Kim 6909e5fb4c docs: restructure agent-model guide by model family and role
Complete rewrite organized around model families, agent roles,
task categories, and selection priority rules.

- Model families: Claude-like (Kimi, GLM/Big Pickle), GPT,
  different-behavior (Gemini, MiniMax), speed-focused (Grok, Spark)
- Agent roles: Claude-optimized, dual-prompt, GPT-native, utility
- gpt-5.3-codex-spark: extremely fast but compacts too aggressively
- Big Pickle = GLM 4.6
- Explicit guidance: do not upgrade utility agents to Opus
- opencode models / opencode auth login references at top
- Link to orchestration system guide for task categories
2026-02-19 15:17:41 +09:00

9.3 KiB

Agent-Model Matching Guide

For agents and users: How to pick the right model for each agent. Read this before customizing model settings.

Run opencode models to see all available models on your system, and opencode auth login to authenticate with providers.


Model Families: Know Your Options

Not all models behave the same way. Understanding which models are "similar" helps you make safe substitutions.

Claude-like Models (instruction-following, structured output)

These models respond similarly to Claude and work well with oh-my-opencode's Claude-optimized prompts:

Model Provider(s) Notes
Claude Opus 4.6 anthropic, github-copilot, opencode Best overall. Default for Sisyphus.
Claude Sonnet 4.6 anthropic, github-copilot, opencode Faster, cheaper. Good balance.
Claude Haiku 4.5 anthropic, opencode Fast and cheap. Good for quick tasks.
Kimi K2.5 kimi-for-coding Behaves very similarly to Claude. Great all-rounder. Default for Atlas.
Kimi K2.5 Free opencode Free-tier Kimi. Rate-limited but functional.
GLM 5 zai-coding-plan, opencode Claude-like behavior. Good for broad tasks.
Big Pickle (GLM 4.6) opencode Free-tier GLM. Decent fallback.

GPT Models (explicit reasoning, principle-driven)

GPT models need differently structured prompts. Some agents auto-detect GPT and switch prompts:

Model Provider(s) Notes
GPT-5.3-codex openai, github-copilot, opencode Deep coding powerhouse. Required for Hephaestus.
GPT-5.2 openai, github-copilot, opencode High intelligence. Default for Oracle.
GPT-5-Nano opencode Ultra-cheap, fast. Good for simple utility tasks.

Different-Behavior Models

These models have unique characteristics — don't assume they'll behave like Claude or GPT:

Model Provider(s) Notes
Gemini 3 Pro google, github-copilot, opencode Excels at visual/frontend tasks. Different reasoning style.
Gemini 3 Flash google, github-copilot, opencode Fast, good for doc search and light tasks.
MiniMax M2.5 venice Fast and smart. Good for utility tasks.
MiniMax M2.5 Free opencode Free-tier MiniMax. Fast for search/retrieval.

Speed-Focused Models

Model Provider(s) Speed Notes
Grok Code Fast 1 github-copilot, venice Very fast Optimized for code grep/search. Default for Explore.
Claude Haiku 4.5 anthropic, opencode Fast Good balance of speed and intelligence.
MiniMax M2.5 (Free) opencode, venice Fast Smart for its speed class.
GPT-5.3-codex-spark openai Extremely fast Blazing fast but compacts so aggressively that oh-my-opencode's context management doesn't work well with it. Not recommended for omo agents.

Claude-Optimized Agents

These agents have prompts tuned for Claude-family models. Use Claude > Kimi K2.5 > GLM 5 in that priority order.

Agent Role Default Chain What It Does
Sisyphus Main ultraworker Opus (max) → Kimi K2.5 → GLM 5 → Big Pickle Primary coding agent. Orchestrates everything. Never use GPT — no GPT prompt exists.
Metis Plan review Opus (max) → Kimi K2.5 → GPT-5.2 → Gemini 3 Pro Reviews Prometheus plans for gaps.

Dual-Prompt Agents (Claude + GPT auto-switch)

These agents detect your model family at runtime and switch to the appropriate prompt. If you have GPT access, these agents can use it effectively.

Priority: Claude > GPT > Claude-like models

Agent Role Default Chain GPT Prompt?
Prometheus Strategic planner Opus (max) → GPT-5.2 (high) → Kimi K2.5 → Gemini 3 Pro Yes — XML-tagged, principle-driven (~300 lines vs ~1,100 Claude)
Atlas Todo orchestrator Kimi K2.5 → Sonnet → GPT-5.2 Yes — GPT-optimized todo management

GPT-Native Agents

These agents are built for GPT. Don't override to Claude.

Agent Role Default Chain Notes
Hephaestus Deep autonomous worker GPT-5.3-codex (medium) only "Codex on steroids." No fallback. Requires GPT access.
Oracle Architecture/debugging GPT-5.2 (high) → Gemini 3 Pro → Opus High-IQ strategic backup. GPT preferred.
Momus High-accuracy reviewer GPT-5.2 (medium) → Opus → Gemini 3 Pro Verification agent. GPT preferred.

Utility Agents (Speed > Intelligence)

These agents do search, grep, and retrieval. They intentionally use fast, cheap models. Don't "upgrade" them to Opus — it wastes tokens on simple tasks.

Agent Role Default Chain Design Rationale
Explore Fast codebase grep MiniMax M2.5 Free → Grok Code Fast → MiniMax M2.5 → Haiku → GPT-5-Nano Speed is everything. Grok is blazing fast for grep.
Librarian Docs/code search MiniMax M2.5 Free → Gemini Flash → Big Pickle Entirely free-tier. Doc retrieval doesn't need deep reasoning.
Multimodal Looker Vision/screenshots Kimi K2.5 → Kimi Free → Gemini Flash → GPT-5.2 → GLM-4.6v Kimi excels at multimodal understanding.

Task Categories

Categories control which model is used for background_task and delegate_task. See the Orchestration System Guide for how agents dispatch tasks to categories.

Category When Used Recommended Models Notes
visual-engineering Frontend, UI, CSS, design Gemini 3 Pro (high) → GLM 5 → Opus → Kimi K2.5 Gemini dominates visual tasks
ultrabrain Maximum reasoning needed GPT-5.3-codex (xhigh) → Gemini 3 Pro → Opus Highest intelligence available
deep Deep coding, complex logic GPT-5.3-codex (medium) → Opus → Gemini 3 Pro Requires GPT availability
artistry Creative, novel approaches Gemini 3 Pro (high) → Opus → GPT-5.2 Requires Gemini availability
quick Simple, fast tasks Haiku → Gemini Flash → GPT-5-Nano Cheapest and fastest
unspecified-high General complex work Opus (max) → GPT-5.2 (high) → Gemini 3 Pro Default when no category fits
unspecified-low General standard work Sonnet → GPT-5.3-codex (medium) → Gemini Flash Everyday tasks
writing Text, docs, prose Kimi K2.5 → Gemini Flash → Sonnet Kimi produces best prose

Why Different Models Need Different Prompts

Claude and GPT models have fundamentally different instruction-following behaviors:

  • Claude models respond well to mechanics-driven prompts — detailed checklists, templates, step-by-step procedures. More rules = more compliance.
  • GPT models (especially 5.2+) respond better to principle-driven prompts — concise principles, XML-tagged structure, explicit decision criteria. More rules = more contradiction surface = more drift.

Key insight from Codex Plan Mode analysis:

  • Codex Plan Mode achieves the same results with 3 principles in ~121 lines that Prometheus's Claude prompt needs ~1,100 lines across 7 files
  • The core concept is "Decision Complete" — a plan must leave ZERO decisions to the implementer
  • GPT follows this literally when stated as a principle; Claude needs enforcement mechanisms

This is why Prometheus and Atlas ship separate prompts per model family — they auto-detect and switch at runtime via isGptModel().


Customization Guide

How to Customize

Override in oh-my-opencode.json:

{
  "agents": {
    "sisyphus": { "model": "kimi-for-coding/k2p5" },
    "prometheus": { "model": "openai/gpt-5.2" }  // Auto-switches to GPT prompt
  }
}

Selection Priority

When choosing models for Claude-optimized agents:

Claude (Opus/Sonnet) > GPT (if agent has dual prompt) > Claude-like (Kimi K2.5, GLM 5)

When choosing models for GPT-native agents:

GPT (5.3-codex, 5.2) > Claude Opus (decent fallback) > Gemini (acceptable)

Safe vs Dangerous Overrides

Safe (same family):

  • Sisyphus: Opus → Sonnet, Kimi K2.5, GLM 5
  • Prometheus: Opus → GPT-5.2 (auto-switches prompt)
  • Atlas: Kimi K2.5 → Sonnet, GPT-5.2 (auto-switches)

Dangerous (no prompt support):

  • Sisyphus → GPT: No GPT prompt. Will degrade significantly.
  • Hephaestus → Claude: Built for Codex. Claude can't replicate this.
  • Explore → Opus: Massive cost waste. Explore needs speed, not intelligence.
  • Librarian → Opus: Same. Doc search doesn't need Opus-level reasoning.

Provider Priority

Native (anthropic/, openai/, google/) > Kimi for Coding > GitHub Copilot > Venice > OpenCode Zen > Z.ai Coding Plan

See Also