Compare commits

..

11 Commits

Author SHA1 Message Date
YeonGyu-Kim
96886f18ac docs: add look_at tool and multimodal-looker agent documentation
🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 15:28:59 +09:00
YeonGyu-Kim
a3938e8c25 feat: add look_at tool and multimodal-looker agent
Add a new tool and agent for analyzing media files (PDFs, images, diagrams)
that require visual interpretation beyond raw text.

- Add `multimodal-looker` agent using Gemini 2.5 Flash model
- Add `look_at` tool that spawns multimodal-looker sessions
- Restrict multimodal-looker from calling task/call_omo_agent/look_at tools

Inspired by Sourcegraph Ampcode's look_at tool design.

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 15:28:59 +09:00
YeonGyu-Kim
821b0b8e9f docs: add known issue and hotfix for opencode-openai-codex-auth 400 error
🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 15:28:59 +09:00
Junho Yeo
356bd1dff3 fix(ci): prevent publish workflow from running on forks (#34) 2025-12-13 14:48:18 +09:00
github-actions[bot]
f2b070cd0b release: v1.0.2 2025-12-13 05:24:38 +00:00
Junho Yeo
1323443c85 refactor: extract shared utilities (isMarkdownFile, isPlainObject, resolveSymlink) (#33) 2025-12-13 14:23:04 +09:00
github-actions[bot]
60d9513d3a release: v1.0.1 2025-12-13 05:06:31 +00:00
YeonGyu-Kim
55bc8f08df refactor(ultrawork-mode): use history injection instead of direct message modification
- Replace direct parts[idx].text modification with injectHookMessage
- Context now injected via filesystem (like UserPromptSubmitHook)
- Preserves original user message without modification

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 14:05:17 +09:00
YeonGyu-Kim
0ac4d223f9 feat(think-mode): inject thinking config with maxTokens for extended thinking
- Actually inject THINKING_CONFIGS into message (was defined but unused)
- Add maxTokens: 128000 for Anthropic (required for extended thinking)
- Add maxTokens: 64000 for Amazon Bedrock
- Track thinkingConfigInjected state

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 14:05:02 +09:00
YeonGyu-Kim
19b3690499 docs: add Ultrawork Mode hook documentation
🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
2025-12-13 14:02:10 +09:00
Junho Yeo
564c8ae8bf fix: use lstatSync instead of statSync for symlink detection (#32) 2025-12-13 13:58:02 +09:00
27 changed files with 313 additions and 64 deletions

View File

@@ -26,6 +26,7 @@ permissions:
jobs:
publish:
runs-on: ubuntu-latest
if: github.repository == 'code-yeongyu/oh-my-opencode'
steps:
- uses: actions/checkout@v4
with:

View File

@@ -166,6 +166,18 @@ opencode auth login
# 브라우저에서 OAuth 플로우 완료
```
**⚠️ 알려진 이슈**: 현재 공식 npm 패키지에 400 에러(`"No tool call found for function call output with call_id"`)를 유발하는 버그가 있습니다. 수정 버전이 배포될 때까지 **핫픽스 브랜치 사용을 권장합니다**. `~/.config/opencode/package.json`을 수정하세요:
```json
{
"dependencies": {
"opencode-openai-codex-auth": "code-yeongyu/opencode-openai-codex-auth#fix/orphaned-function-call-output-with-tools"
}
}
```
그 후 `cd ~/.config/opencode && bun i`를 실행하세요. `opencode.json`에서는 버전 없이 `"opencode-openai-codex-auth"`로 사용합니다 (`@4.1.0` 제외).
#### 4.4 대안: 프록시 기반 인증
프록시 기반 인증을 선호하는 사용자를 위해 [VibeProxy](https://github.com/automazeio/vibeproxy) (macOS) 또는 [CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI)를 대안으로 사용할 수 있습니다.
@@ -206,6 +218,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
- **explore** (`opencode/grok-code`): 빠른 코드베이스 탐색, 파일 패턴 매칭. Claude Code는 Haiku를 쓰지만, 우리는 Grok을 씁니다. 현재 무료이고, 극도로 빠르며, 파일 탐색 작업에 충분한 지능을 갖췄기 때문입니다. Claude Code 에서 영감을 받았습니다.
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): 개발자로 전향한 디자이너라는 설정을 갖고 있습니다. 멋진 UI를 만듭니다. 아름답고 창의적인 UI 코드를 생성하는 데 탁월한 Gemini를 사용합니다.
- **document-writer** (`google/gemini-3-pro-preview`): 기술 문서 전문가라는 설정을 갖고 있습니다. Gemini 는 문학가입니다. 글을 기가막히게 씁니다.
- **multimodal-looker** (`google/gemini-2.5-flash`): 시각적 콘텐츠 해석을 위한 전문 에이전트. PDF, 이미지, 다이어그램을 분석하여 정보를 추출합니다.
각 에이전트는 메인 에이전트가 알아서 호출하지만, 명시적으로 요청할 수도 있습니다:
@@ -258,6 +271,12 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
- 기본 `glob`은 타임아웃이 없습니다. ripgrep이 멈추면 무한정 대기합니다.
- 이 도구는 타임아웃을 강제하고 만료 시 프로세스를 종료합니다.
#### 내장 멀티모달 도구 (Built-in Multimodal Tools)
- **look_at**: 시각적 해석이 필요한 미디어 파일(PDF, 이미지, 다이어그램 등)을 Gemini 2.5 Flash를 사용하여 분석합니다. Sourcegraph Ampcode의 `look_at` 도구에서 영감을 받았습니다.
- 파라미터: `file_path` (절대 경로), `goal` (추출할 정보)
- 사용 사례: PDF 텍스트 추출, 이미지 설명, 다이어그램 분석
#### 내장 MCPs
- **websearch_exa**: Exa AI 웹 검색. 실시간 웹 검색과 콘텐츠 스크래핑을 수행합니다. 관련 웹사이트에서 LLM에 최적화된 컨텍스트를 반환합니다.
@@ -332,6 +351,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
- Use camelCase for function names
```
- **Think Mode**: 확장된 사고(Extended Thinking)가 필요한 상황을 자동으로 감지하고 모드를 전환합니다. 사용자가 깊은 사고를 요청하는 표현(예: "think deeply", "ultrathink")을 감지하면, 추론 능력을 극대화하도록 모델 설정을 동적으로 조정합니다.
- **Ultrawork Mode**: 사용자가 "ultrawork" 또는 "ulw" 키워드를 입력하면 자동으로 에이전트 오케스트레이션 가이드를 주입합니다. 메인 에이전트가 모든 가용한 전문 에이전트(탐색, 사서, 계획, UI)를 백그라운드 작업을 통해 병렬로 최대한 활용하도록 강제하며, 엄격한 TODO 추적 및 검증 프로토콜을 따르게 합니다.
- **Anthropic Auto Compact**: Anthropic 모델 사용 시 컨텍스트 한계에 도달하면 대화 기록을 자동으로 압축하여 효율적으로 관리합니다.
- **Empty Task Response Detector**: 서브 에이전트가 수행한 작업이 비어있거나 무의미한 응답을 반환하는 경우를 감지하여, 오류 없이 우아하게 처리합니다.
- **Grep Output Truncator**: Grep 검색 결과가 너무 길어 컨텍스트를 장악해버리는 것을 방지하기 위해, 과도한 출력을 자동으로 자릅니다.
@@ -344,7 +364,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
}
```
사용 가능한 훅: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
사용 가능한 훅: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `ultrawork-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
> **참고**: `disabled_hooks`는 Oh My OpenCode의 내장 훅을 제어합니다. Claude Code의 `settings.json` 훅을 비활성화하려면 `claude_code.hooks: false`를 대신 사용하세요 ([호환성 토글](#호환성-토글) 참고).

View File

@@ -165,6 +165,18 @@ opencode auth login
# Complete OAuth flow in browser
```
**⚠️ Known Issue**: The official npm package currently has a bug that causes 400 errors (`"No tool call found for function call output with call_id"`). Until a fix is released, **use the hotfix branch instead**. Modify `~/.config/opencode/package.json`:
```json
{
"dependencies": {
"opencode-openai-codex-auth": "code-yeongyu/opencode-openai-codex-auth#fix/orphaned-function-call-output-with-tools"
}
}
```
Then run `cd ~/.config/opencode && bun i`. In your `opencode.json`, use the plugin name without a version: `"opencode-openai-codex-auth"` (not `@4.1.0`).
#### 4.4 Alternative: Proxy-based Authentication
For users who prefer proxy-based authentication, [VibeProxy](https://github.com/automazeio/vibeproxy) (macOS) or [CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI) remain available as alternatives.
@@ -203,6 +215,7 @@ I believe in the right tool for the job. For your wallet's sake, use CLIProxyAPI
- **explore** (`opencode/grok-code`): Fast exploration and pattern matching. Claude Code uses Haiku; we use Grok. It is currently free, blazing fast, and intelligent enough for file traversal. Inspired by Claude Code.
- **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): A designer turned developer. Creates stunning UIs. Uses Gemini because its creativity and UI code generation are superior.
- **document-writer** (`google/gemini-3-pro-preview`): A technical writing expert. Gemini is a wordsmith; it writes prose that flows naturally.
- **multimodal-looker** (`google/gemini-2.5-flash`): Specialized agent for visual content interpretation. Analyzes PDFs, images, and diagrams to extract information.
Each agent is automatically invoked by the main agent, but you can also explicitly request them:
@@ -257,6 +270,12 @@ The features you use in your editor—other agents cannot access them. Oh My Ope
- The default `glob` lacks timeout. If ripgrep hangs, it waits indefinitely.
- This tool enforces timeouts and kills the process on expiration.
#### Built-in Multimodal Tools
- **look_at**: Analyzes media files (PDFs, images, diagrams) that require visual interpretation using Gemini 2.5 Flash. Inspired by Sourcegraph Ampcode's `look_at` tool.
- Parameters: `file_path` (absolute path), `goal` (what to extract)
- Use cases: PDF text extraction, image description, diagram analysis
#### Built-in MCPs
- **websearch_exa**: Exa AI web search. Performs real-time web searches and can scrape content from specific URLs. Returns LLM-optimized context from relevant websites.
@@ -330,6 +349,7 @@ Example workflow:
- Use camelCase for function names
```
- **Think Mode**: Automatic extended thinking detection and mode switching. Detects when user requests deep thinking (e.g., "think deeply", "ultrathink") and dynamically adjusts model settings for enhanced reasoning.
- **Ultrawork Mode**: When user triggers "ultrawork" or "ulw" keywords, automatically injects agent orchestration guidance. Forces the main agent to leverage all available specialized agents (exploration, librarian, planning, UI) via background tasks in parallel, with strict TODO tracking and verification protocols.
- **Anthropic Auto Compact**: Automatically compacts conversation history when approaching context limits for Anthropic models.
- **Empty Task Response Detector**: Detects when subagent tasks return empty or meaningless responses and handles gracefully.
- **Grep Output Truncator**: Prevents grep output from overwhelming the context by truncating excessively long results.
@@ -342,7 +362,7 @@ You can disable specific built-in hooks using `disabled_hooks` in `~/.config/ope
}
```
Available hooks: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
Available hooks: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `ultrawork-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
> **Note**: `disabled_hooks` controls Oh My OpenCode's built-in hooks. To disable Claude Code's `settings.json` hooks, use `claude_code.hooks: false` instead (see [Compatibility Toggles](#compatibility-toggles)).

View File

@@ -1,6 +1,6 @@
{
"name": "oh-my-opencode",
"version": "1.0.0",
"version": "1.0.2",
"description": "OpenCode plugin - custom agents (oracle, librarian) and enhanced features",
"main": "dist/index.js",
"types": "dist/index.d.ts",

View File

@@ -4,6 +4,7 @@ import { librarianAgent } from "./librarian"
import { exploreAgent } from "./explore"
import { frontendUiUxEngineerAgent } from "./frontend-ui-ux-engineer"
import { documentWriterAgent } from "./document-writer"
import { multimodalLookerAgent } from "./multimodal-looker"
export const builtinAgents: Record<string, AgentConfig> = {
oracle: oracleAgent,
@@ -11,6 +12,7 @@ export const builtinAgents: Record<string, AgentConfig> = {
explore: exploreAgent,
"frontend-ui-ux-engineer": frontendUiUxEngineerAgent,
"document-writer": documentWriterAgent,
"multimodal-looker": multimodalLookerAgent,
}
export * from "./types"

View File

@@ -0,0 +1,42 @@
import type { AgentConfig } from "@opencode-ai/sdk"
export const multimodalLookerAgent: AgentConfig = {
description:
"Analyze media files (PDFs, images, diagrams) that require interpretation beyond raw text. Extracts specific information or summaries from documents, describes visual content. Use when you need analyzed/extracted data rather than literal file contents.",
mode: "subagent",
model: "google/gemini-2.5-flash",
temperature: 0.1,
tools: { Read: true },
prompt: `You interpret media files that cannot be read as plain text.
Your job: examine the attached file and extract ONLY what was requested.
When to use you:
- Media files the Read tool cannot interpret
- Extracting specific information or summaries from documents
- Describing visual content in images or diagrams
- When analyzed/extracted data is needed, not raw file contents
When NOT to use you:
- Source code or plain text files needing exact contents (use Read)
- Files that need editing afterward (need literal content from Read)
- Simple file reading where no interpretation is needed
How you work:
1. Receive a file path and a goal describing what to extract
2. Read and analyze the file deeply
3. Return ONLY the relevant extracted information
4. The main agent never processes the raw file - you save context tokens
For PDFs: extract text, structure, tables, data from specific sections
For images: describe layouts, UI elements, text, diagrams, charts
For diagrams: explain relationships, flows, architecture depicted
Response rules:
- Return extracted information directly, no preamble
- If info not found, state clearly what's missing
- Match the language of the request
- Be thorough on the goal, concise on everything else
Your output goes straight to the main agent for continued work.`,
}

View File

@@ -6,6 +6,7 @@ export type AgentName =
| "explore"
| "frontend-ui-ux-engineer"
| "document-writer"
| "multimodal-looker"
export type AgentOverrideConfig = Partial<AgentConfig>

View File

@@ -5,6 +5,7 @@ import { librarianAgent } from "./librarian"
import { exploreAgent } from "./explore"
import { frontendUiUxEngineerAgent } from "./frontend-ui-ux-engineer"
import { documentWriterAgent } from "./document-writer"
import { multimodalLookerAgent } from "./multimodal-looker"
import { deepMerge } from "../shared"
const allBuiltinAgents: Record<AgentName, AgentConfig> = {
@@ -13,6 +14,7 @@ const allBuiltinAgents: Record<AgentName, AgentConfig> = {
explore: exploreAgent,
"frontend-ui-ux-engineer": frontendUiUxEngineerAgent,
"document-writer": documentWriterAgent,
"multimodal-looker": multimodalLookerAgent,
}
function mergeAgentConfig(

View File

@@ -3,6 +3,7 @@ import { homedir } from "os"
import { join, basename } from "path"
import type { AgentConfig } from "@opencode-ai/sdk"
import { parseFrontmatter } from "../../shared/frontmatter"
import { isMarkdownFile } from "../../shared/file-utils"
import type { AgentScope, AgentFrontmatter, LoadedAgent } from "./types"
function parseToolsConfig(toolsStr?: string): Record<string, boolean> | undefined {
@@ -18,10 +19,6 @@ function parseToolsConfig(toolsStr?: string): Record<string, boolean> | undefine
return result
}
function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
}
function loadAgentsFromDir(agentsDir: string, scope: AgentScope): LoadedAgent[] {
if (!existsSync(agentsDir)) {
return []

View File

@@ -3,12 +3,9 @@ import { homedir } from "os"
import { join, basename } from "path"
import { parseFrontmatter } from "../../shared/frontmatter"
import { sanitizeModelField } from "../../shared/model-sanitizer"
import { isMarkdownFile } from "../../shared/file-utils"
import type { CommandScope, CommandDefinition, CommandFrontmatter, LoadedCommand } from "./types"
function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
}
function loadCommandsFromDir(commandsDir: string, scope: CommandScope): LoadedCommand[] {
if (!existsSync(commandsDir)) {
return []

View File

@@ -1,8 +1,9 @@
import { existsSync, readdirSync, readFileSync, statSync, readlinkSync } from "fs"
import { existsSync, readdirSync, readFileSync } from "fs"
import { homedir } from "os"
import { join, resolve } from "path"
import { join } from "path"
import { parseFrontmatter } from "../../shared/frontmatter"
import { sanitizeModelField } from "../../shared/model-sanitizer"
import { resolveSymlink } from "../../shared/file-utils"
import type { CommandDefinition } from "../claude-code-command-loader/types"
import type { SkillScope, SkillMetadata, LoadedSkillAsCommand } from "./types"
@@ -21,10 +22,7 @@ function loadSkillsFromDir(skillsDir: string, scope: SkillScope): LoadedSkillAsC
if (!entry.isDirectory() && !entry.isSymbolicLink()) continue
let resolvedPath = skillPath
if (statSync(skillPath, { throwIfNoEntry: false })?.isSymbolicLink()) {
resolvedPath = resolve(skillPath, "..", readlinkSync(skillPath))
}
const resolvedPath = resolveSymlink(skillPath)
const skillMdPath = join(resolvedPath, "SKILL.md")
if (!existsSync(skillMdPath)) continue

View File

@@ -1,6 +1,7 @@
import { detectThinkKeyword, extractPromptText } from "./detector"
import { getHighVariant, isAlreadyHighVariant } from "./switcher"
import { getHighVariant, isAlreadyHighVariant, getThinkingConfig } from "./switcher"
import type { ThinkModeState, ThinkModeInput } from "./types"
import { log } from "../../shared"
export * from "./detector"
export * from "./switcher"
@@ -23,6 +24,7 @@ export function createThinkModeHook() {
const state: ThinkModeState = {
requested: false,
modelSwitched: false,
thinkingConfigInjected: false,
}
if (!detectThinkKeyword(promptText)) {
@@ -47,17 +49,31 @@ export function createThinkModeHook() {
}
const highVariant = getHighVariant(currentModel.modelID)
const thinkingConfig = getThinkingConfig(currentModel.providerID, currentModel.modelID)
if (!highVariant) {
thinkModeState.set(sessionID, state)
return
if (highVariant) {
output.message.model = {
providerID: currentModel.providerID,
modelID: highVariant,
}
state.modelSwitched = true
log("Think mode: model switched to high variant", {
sessionID,
from: currentModel.modelID,
to: highVariant,
})
}
output.message.model = {
providerID: currentModel.providerID,
modelID: highVariant,
if (thinkingConfig) {
Object.assign(output.message, thinkingConfig)
state.thinkingConfigInjected = true
log("Think mode: thinking config injected", {
sessionID,
provider: currentModel.providerID,
config: thinkingConfig,
})
}
state.modelSwitched = true
thinkModeState.set(sessionID, state)
},

View File

@@ -55,12 +55,14 @@ export const THINKING_CONFIGS: Record<string, Record<string, unknown>> = {
type: "enabled",
budgetTokens: 64000,
},
maxTokens: 128000,
},
"amazon-bedrock": {
reasoningConfig: {
type: "enabled",
budgetTokens: 32000,
},
maxTokens: 64000,
},
google: {
providerOptions: {

View File

@@ -1,6 +1,7 @@
export interface ThinkModeState {
requested: boolean
modelSwitched: boolean
thinkingConfigInjected: boolean
providerID?: string
modelID?: string
}

View File

@@ -2,6 +2,7 @@ import { detectUltraworkKeyword, extractPromptText } from "./detector"
import { ULTRAWORK_CONTEXT } from "./constants"
import type { UltraworkModeState } from "./types"
import { log } from "../../shared"
import { injectHookMessage } from "../../features/hook-message-injector"
export * from "./detector"
export * from "./constants"
@@ -16,13 +17,13 @@ export function clearUltraworkModeState(sessionID: string): void {
export function createUltraworkModeHook() {
return {
/**
* chat.message hook - detect ultrawork/ulw keywords, inject context
* chat.message hook - detect ultrawork/ulw keywords, inject context via history
*
* Execution timing: AFTER claudeCodeHooks["chat.message"]
* Behavior:
* 1. Extract text from user prompt
* 2. Detect ultrawork/ulw keywords (excluding code blocks)
* 3. If detected, prepend ULTRAWORK_CONTEXT to first text part
* 3. If detected, inject ULTRAWORK_CONTEXT via injectHookMessage (history injection)
*/
"chat.message": async (
input: {
@@ -51,13 +52,25 @@ export function createUltraworkModeHook() {
state.detected = true
log("Ultrawork keyword detected", { sessionID: input.sessionID })
const parts = output.parts as Array<{ type: string; text?: string }>
const idx = parts.findIndex((p) => p.type === "text" && p.text)
const message = output.message as {
agent?: string
model?: { modelID?: string; providerID?: string }
path?: { cwd?: string; root?: string }
tools?: Record<string, boolean>
}
if (idx >= 0) {
parts[idx].text = `${ULTRAWORK_CONTEXT}${parts[idx].text ?? ""}`
const success = injectHookMessage(input.sessionID, ULTRAWORK_CONTEXT, {
agent: message.agent,
model: message.model,
path: message.path,
tools: message.tools,
})
if (success) {
state.injected = true
log("Ultrawork context injected", { sessionID: input.sessionID })
log("Ultrawork context injected via history", { sessionID: input.sessionID })
} else {
log("Ultrawork context injection failed", { sessionID: input.sessionID })
}
ultraworkModeState.set(input.sessionID, state)

View File

@@ -41,7 +41,7 @@ import {
getCurrentSessionTitle,
} from "./features/claude-code-session-state";
import { updateTerminalTitle } from "./features/terminal";
import { builtinTools, createCallOmoAgent, createBackgroundTools } from "./tools";
import { builtinTools, createCallOmoAgent, createBackgroundTools, createLookAt } from "./tools";
import { BackgroundManager } from "./features/background-agent";
import { createBuiltinMcps } from "./mcp";
import { OhMyOpenCodeConfigSchema, type OhMyOpenCodeConfig, type HookName } from "./config";
@@ -218,6 +218,7 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
const backgroundTools = createBackgroundTools(backgroundManager, ctx.client);
const callOmoAgent = createCallOmoAgent(ctx, backgroundManager);
const lookAt = createLookAt(ctx);
const googleAuthHooks = pluginConfig.google_auth
? await createGoogleAntigravityAuthPlugin(ctx)
@@ -230,6 +231,7 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
...builtinTools,
...backgroundTools,
call_omo_agent: callOmoAgent,
look_at: lookAt,
},
"chat.message": async (input, output) => {
@@ -268,6 +270,14 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
call_omo_agent: false,
};
}
if (config.agent["multimodal-looker"]) {
config.agent["multimodal-looker"].tools = {
...config.agent["multimodal-looker"].tools,
task: false,
call_omo_agent: false,
look_at: false,
};
}
const mcpResult = (pluginConfig.claude_code?.mcp ?? true)
? await loadMcpConfigs()

View File

@@ -1,7 +1,7 @@
const DANGEROUS_KEYS = new Set(["__proto__", "constructor", "prototype"]);
const MAX_DEPTH = 50;
function isPlainObject(value: unknown): value is Record<string, unknown> {
export function isPlainObject(value: unknown): value is Record<string, unknown> {
return (
typeof value === "object" &&
value !== null &&

26
src/shared/file-utils.ts Normal file
View File

@@ -0,0 +1,26 @@
import { lstatSync, readlinkSync } from "fs"
import { resolve } from "path"
export function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
}
export function isSymbolicLink(filePath: string): boolean {
try {
return lstatSync(filePath, { throwIfNoEntry: false })?.isSymbolicLink() ?? false
} catch {
return false
}
}
export function resolveSymlink(filePath: string): string {
try {
const stats = lstatSync(filePath, { throwIfNoEntry: false })
if (stats?.isSymbolicLink()) {
return resolve(filePath, "..", readlinkSync(filePath))
}
return filePath
} catch {
return filePath
}
}

View File

@@ -8,3 +8,4 @@ export * from "./tool-name"
export * from "./pattern-matcher"
export * from "./hook-disabled"
export * from "./deep-merge"
export * from "./file-utils"

View File

@@ -1,3 +1,5 @@
import { isPlainObject } from "./deep-merge"
export function camelToSnake(str: string): string {
return str.replace(/[A-Z]/g, (letter) => `_${letter.toLowerCase()}`)
}
@@ -6,10 +8,6 @@ export function snakeToCamel(str: string): string {
return str.replace(/_([a-z])/g, (_, letter) => letter.toUpperCase())
}
function isPlainObject(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value)
}
export function objectToSnakeCase(
obj: Record<string, unknown>,
deep: boolean = true

View File

@@ -34,6 +34,7 @@ import type { BackgroundManager } from "../features/background-agent"
type OpencodeClient = PluginInput["client"]
export { createCallOmoAgent } from "./call-omo-agent"
export { createLookAt } from "./look-at"
export function createBackgroundTools(manager: BackgroundManager, client: OpencodeClient) {
return {

View File

@@ -0,0 +1,23 @@
export const MULTIMODAL_LOOKER_AGENT = "multimodal-looker" as const
export const LOOK_AT_DESCRIPTION = `Analyze media files (PDFs, images, diagrams) that require visual interpretation.
Use this tool to extract specific information from files that cannot be processed as plain text:
- PDF documents: extract text, tables, structure, specific sections
- Images: describe layouts, UI elements, text content, diagrams
- Charts/Graphs: explain data, trends, relationships
- Screenshots: identify UI components, text, visual elements
- Architecture diagrams: explain flows, connections, components
Parameters:
- file_path: Absolute path to the file to analyze
- goal: What specific information to extract (be specific for better results)
Examples:
- "Extract all API endpoints from this OpenAPI spec PDF"
- "Describe the UI layout and components in this screenshot"
- "Explain the data flow in this architecture diagram"
- "List all table data from page 3 of this PDF"
This tool uses a separate context window with Gemini 2.5 Flash for multimodal analysis,
saving tokens in the main conversation while providing accurate visual interpretation.`

View File

@@ -0,0 +1,3 @@
export * from "./types"
export * from "./constants"
export { createLookAt } from "./tools"

View File

@@ -0,0 +1,91 @@
import { tool, type PluginInput } from "@opencode-ai/plugin"
import { LOOK_AT_DESCRIPTION, MULTIMODAL_LOOKER_AGENT } from "./constants"
import type { LookAtArgs } from "./types"
import { log } from "../../shared/logger"
export function createLookAt(ctx: PluginInput) {
return tool({
description: LOOK_AT_DESCRIPTION,
args: {
file_path: tool.schema.string().describe("Absolute path to the file to analyze"),
goal: tool.schema.string().describe("What specific information to extract from the file"),
},
async execute(args: LookAtArgs, toolContext) {
log(`[look_at] Analyzing file: ${args.file_path}, goal: ${args.goal}`)
const prompt = `Analyze this file and extract the requested information.
File path: ${args.file_path}
Goal: ${args.goal}
Read the file using the Read tool, then provide ONLY the extracted information that matches the goal.
Be thorough on what was requested, concise on everything else.
If the requested information is not found, clearly state what is missing.`
log(`[look_at] Creating session with parent: ${toolContext.sessionID}`)
const createResult = await ctx.client.session.create({
body: {
parentID: toolContext.sessionID,
title: `look_at: ${args.goal.substring(0, 50)}`,
},
})
if (createResult.error) {
log(`[look_at] Session create error:`, createResult.error)
return `Error: Failed to create session: ${createResult.error}`
}
const sessionID = createResult.data.id
log(`[look_at] Created session: ${sessionID}`)
log(`[look_at] Sending prompt to session ${sessionID}`)
await ctx.client.session.prompt({
path: { id: sessionID },
body: {
agent: MULTIMODAL_LOOKER_AGENT,
tools: {
task: false,
call_omo_agent: false,
look_at: false,
},
parts: [{ type: "text", text: prompt }],
},
})
log(`[look_at] Prompt sent, fetching messages...`)
const messagesResult = await ctx.client.session.messages({
path: { id: sessionID },
})
if (messagesResult.error) {
log(`[look_at] Messages error:`, messagesResult.error)
return `Error: Failed to get messages: ${messagesResult.error}`
}
const messages = messagesResult.data
log(`[look_at] Got ${messages.length} messages`)
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const lastAssistantMessage = messages
.filter((m: any) => m.info.role === "assistant")
.sort((a: any, b: any) => (b.info.time?.created || 0) - (a.info.time?.created || 0))[0]
if (!lastAssistantMessage) {
log(`[look_at] No assistant message found`)
return `Error: No response from multimodal-looker agent`
}
log(`[look_at] Found assistant message with ${lastAssistantMessage.parts.length} parts`)
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const textParts = lastAssistantMessage.parts.filter((p: any) => p.type === "text")
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const responseText = textParts.map((p: any) => p.text).join("\n")
log(`[look_at] Got response, length: ${responseText.length}`)
return responseText
},
})
}

View File

@@ -0,0 +1,4 @@
export interface LookAtArgs {
file_path: string
goal: string
}

View File

@@ -1,9 +1,10 @@
import { tool } from "@opencode-ai/plugin"
import { existsSync, readdirSync, statSync, readlinkSync, readFileSync } from "fs"
import { existsSync, readdirSync, readFileSync } from "fs"
import { homedir } from "os"
import { join, resolve, basename } from "path"
import { join, basename } from "path"
import { z } from "zod/v4"
import { parseFrontmatter, resolveCommandsInText } from "../../shared"
import { resolveSymlink } from "../../shared/file-utils"
import { SkillFrontmatterSchema } from "./types"
import type { SkillScope, SkillMetadata, SkillInfo, LoadedSkill, SkillFrontmatter } from "./types"
@@ -37,15 +38,7 @@ function discoverSkillsFromDir(
const skillPath = join(skillsDir, entry.name)
if (entry.isDirectory() || entry.isSymbolicLink()) {
let resolvedPath = skillPath
try {
const stats = statSync(skillPath, { throwIfNoEntry: false })
if (stats?.isSymbolicLink()) {
resolvedPath = resolve(skillPath, "..", readlinkSync(skillPath))
}
} catch {
continue
}
const resolvedPath = resolveSymlink(skillPath)
const skillMdPath = join(resolvedPath, "SKILL.md")
if (!existsSync(skillMdPath)) continue
@@ -83,18 +76,6 @@ const skillListForDescription = availableSkills
.map((s) => `- ${s.name}: ${s.description} (${s.scope})`)
.join("\n")
function resolveSymlink(skillPath: string): string {
try {
const stats = statSync(skillPath, { throwIfNoEntry: false })
if (stats?.isSymbolicLink()) {
return resolve(skillPath, "..", readlinkSync(skillPath))
}
return skillPath
} catch {
return skillPath
}
}
async function parseSkillMd(skillPath: string): Promise<SkillInfo | null> {
const resolvedPath = resolveSymlink(skillPath)
const skillMdPath = join(resolvedPath, "SKILL.md")

View File

@@ -3,6 +3,7 @@ import { existsSync, readdirSync, readFileSync } from "fs"
import { homedir } from "os"
import { join, basename, dirname } from "path"
import { parseFrontmatter, resolveCommandsInText, resolveFileReferencesInText, sanitizeModelField } from "../../shared"
import { isMarkdownFile } from "../../shared/file-utils"
import type { CommandScope, CommandMetadata, CommandInfo } from "./types"
function discoverCommandsFromDir(commandsDir: string, scope: CommandScope): CommandInfo[] {
@@ -14,9 +15,7 @@ function discoverCommandsFromDir(commandsDir: string, scope: CommandScope): Comm
const commands: CommandInfo[] = []
for (const entry of entries) {
if (entry.name.startsWith(".")) continue
if (!entry.name.endsWith(".md")) continue
if (!entry.isFile()) continue
if (!isMarkdownFile(entry)) continue
const commandPath = join(commandsDir, entry.name)
const commandName = basename(entry.name, ".md")