docs: add look_at tool and multimodal-looker agent documentation

🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)
feat: add look_at tool and multimodal-looker agent
2025-12-13 15:28:59 +09:00 · 2025-12-13 15:28:59 +09:00 · 2025-12-13 15:28:59 +09:00 · 2025-12-13 14:48:18 +09:00 · 2025-12-13 05:24:38 +00:00 · 2025-12-13 14:23:04 +09:00
27 changed files with 313 additions and 64 deletions
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -26,6 +26,7 @@ permissions:
 jobs:
  publish:
    runs-on: ubuntu-latest
+    if: github.repository == 'code-yeongyu/oh-my-opencode'
    steps:
      - uses: actions/checkout@v4
        with:
--- a/README.ko.md
+++ b/README.ko.md
@@ -166,6 +166,18 @@ opencode auth login
 # 브라우저에서 OAuth 플로우 완료
 ```

+**⚠️ 알려진 이슈**: 현재 공식 npm 패키지에 400 에러(`"No tool call found for function call output with call_id"`)를 유발하는 버그가 있습니다. 수정 버전이 배포될 때까지 **핫픽스 브랜치 사용을 권장합니다**. `~/.config/opencode/package.json`을 수정하세요:
+
+```json
+{
+  "dependencies": {
+    "opencode-openai-codex-auth": "code-yeongyu/opencode-openai-codex-auth#fix/orphaned-function-call-output-with-tools"
+  }
+}
+```
+
+그 후 `cd ~/.config/opencode && bun i`를 실행하세요. `opencode.json`에서는 버전 없이 `"opencode-openai-codex-auth"`로 사용합니다 (`@4.1.0` 제외).
+
 #### 4.4 대안: 프록시 기반 인증

 프록시 기반 인증을 선호하는 사용자를 위해 [VibeProxy](https://github.com/automazeio/vibeproxy) (macOS) 또는 [CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI)를 대안으로 사용할 수 있습니다.
@@ -206,6 +218,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
 - **explore** (`opencode/grok-code`): 빠른 코드베이스 탐색, 파일 패턴 매칭. Claude Code는 Haiku를 쓰지만, 우리는 Grok을 씁니다. 현재 무료이고, 극도로 빠르며, 파일 탐색 작업에 충분한 지능을 갖췄기 때문입니다. Claude Code 에서 영감을 받았습니다.
 - **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): 개발자로 전향한 디자이너라는 설정을 갖고 있습니다. 멋진 UI를 만듭니다. 아름답고 창의적인 UI 코드를 생성하는 데 탁월한 Gemini를 사용합니다.
 - **document-writer** (`google/gemini-3-pro-preview`): 기술 문서 전문가라는 설정을 갖고 있습니다. Gemini 는 문학가입니다. 글을 기가막히게 씁니다.
+- **multimodal-looker** (`google/gemini-2.5-flash`): 시각적 콘텐츠 해석을 위한 전문 에이전트. PDF, 이미지, 다이어그램을 분석하여 정보를 추출합니다.

 각 에이전트는 메인 에이전트가 알아서 호출하지만, 명시적으로 요청할 수도 있습니다:

@@ -258,6 +271,12 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
  - 기본 `glob`은 타임아웃이 없습니다. ripgrep이 멈추면 무한정 대기합니다.
  - 이 도구는 타임아웃을 강제하고 만료 시 프로세스를 종료합니다.

+#### 내장 멀티모달 도구 (Built-in Multimodal Tools)
+
+- **look_at**: 시각적 해석이 필요한 미디어 파일(PDF, 이미지, 다이어그램 등)을 Gemini 2.5 Flash를 사용하여 분석합니다. Sourcegraph Ampcode의 `look_at` 도구에서 영감을 받았습니다.
+  - 파라미터: `file_path` (절대 경로), `goal` (추출할 정보)
+  - 사용 사례: PDF 텍스트 추출, 이미지 설명, 다이어그램 분석
+
 #### 내장 MCPs

 - **websearch_exa**: Exa AI 웹 검색. 실시간 웹 검색과 콘텐츠 스크래핑을 수행합니다. 관련 웹사이트에서 LLM에 최적화된 컨텍스트를 반환합니다.
@@ -332,6 +351,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
    - Use camelCase for function names
    ```
 - **Think Mode**: 확장된 사고(Extended Thinking)가 필요한 상황을 자동으로 감지하고 모드를 전환합니다. 사용자가 깊은 사고를 요청하는 표현(예: "think deeply", "ultrathink")을 감지하면, 추론 능력을 극대화하도록 모델 설정을 동적으로 조정합니다.
+- **Ultrawork Mode**: 사용자가 "ultrawork" 또는 "ulw" 키워드를 입력하면 자동으로 에이전트 오케스트레이션 가이드를 주입합니다. 메인 에이전트가 모든 가용한 전문 에이전트(탐색, 사서, 계획, UI)를 백그라운드 작업을 통해 병렬로 최대한 활용하도록 강제하며, 엄격한 TODO 추적 및 검증 프로토콜을 따르게 합니다.
 - **Anthropic Auto Compact**: Anthropic 모델 사용 시 컨텍스트 한계에 도달하면 대화 기록을 자동으로 압축하여 효율적으로 관리합니다.
 - **Empty Task Response Detector**: 서브 에이전트가 수행한 작업이 비어있거나 무의미한 응답을 반환하는 경우를 감지하여, 오류 없이 우아하게 처리합니다.
 - **Grep Output Truncator**: Grep 검색 결과가 너무 길어 컨텍스트를 장악해버리는 것을 방지하기 위해, 과도한 출력을 자동으로 자릅니다.
@@ -344,7 +364,7 @@ OpenCode 는 아주 확장가능하고 아주 커스터마이저블합니다.
 }
 ```

-사용 가능한 훅: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
+사용 가능한 훅: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `ultrawork-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`

 > **참고**: `disabled_hooks`는 Oh My OpenCode의 내장 훅을 제어합니다. Claude Code의 `settings.json` 훅을 비활성화하려면 `claude_code.hooks: false`를 대신 사용하세요 ([호환성 토글](#호환성-토글) 참고).

--- a/README.md
+++ b/README.md
@@ -165,6 +165,18 @@ opencode auth login
 # Complete OAuth flow in browser
 ```

+**⚠️ Known Issue**: The official npm package currently has a bug that causes 400 errors (`"No tool call found for function call output with call_id"`). Until a fix is released, **use the hotfix branch instead**. Modify `~/.config/opencode/package.json`:
+
+```json
+{
+  "dependencies": {
+    "opencode-openai-codex-auth": "code-yeongyu/opencode-openai-codex-auth#fix/orphaned-function-call-output-with-tools"
+  }
+}
+```
+
+Then run `cd ~/.config/opencode && bun i`. In your `opencode.json`, use the plugin name without a version: `"opencode-openai-codex-auth"` (not `@4.1.0`).
+
 #### 4.4 Alternative: Proxy-based Authentication

 For users who prefer proxy-based authentication, [VibeProxy](https://github.com/automazeio/vibeproxy) (macOS) or [CLIProxyAPI](https://github.com/router-for-me/CLIProxyAPI) remain available as alternatives.
@@ -203,6 +215,7 @@ I believe in the right tool for the job. For your wallet's sake, use CLIProxyAPI
 - **explore** (`opencode/grok-code`): Fast exploration and pattern matching. Claude Code uses Haiku; we use Grok. It is currently free, blazing fast, and intelligent enough for file traversal. Inspired by Claude Code.
 - **frontend-ui-ux-engineer** (`google/gemini-3-pro-preview`): A designer turned developer. Creates stunning UIs. Uses Gemini because its creativity and UI code generation are superior.
 - **document-writer** (`google/gemini-3-pro-preview`): A technical writing expert. Gemini is a wordsmith; it writes prose that flows naturally.
+- **multimodal-looker** (`google/gemini-2.5-flash`): Specialized agent for visual content interpretation. Analyzes PDFs, images, and diagrams to extract information.

 Each agent is automatically invoked by the main agent, but you can also explicitly request them:

@@ -257,6 +270,12 @@ The features you use in your editor—other agents cannot access them. Oh My Ope
  - The default `glob` lacks timeout. If ripgrep hangs, it waits indefinitely.
  - This tool enforces timeouts and kills the process on expiration.

+#### Built-in Multimodal Tools
+
+- **look_at**: Analyzes media files (PDFs, images, diagrams) that require visual interpretation using Gemini 2.5 Flash. Inspired by Sourcegraph Ampcode's `look_at` tool.
+  - Parameters: `file_path` (absolute path), `goal` (what to extract)
+  - Use cases: PDF text extraction, image description, diagram analysis
+
 #### Built-in MCPs

 - **websearch_exa**: Exa AI web search. Performs real-time web searches and can scrape content from specific URLs. Returns LLM-optimized context from relevant websites.
@@ -330,6 +349,7 @@ Example workflow:
    - Use camelCase for function names
    ```
 - **Think Mode**: Automatic extended thinking detection and mode switching. Detects when user requests deep thinking (e.g., "think deeply", "ultrathink") and dynamically adjusts model settings for enhanced reasoning.
+- **Ultrawork Mode**: When user triggers "ultrawork" or "ulw" keywords, automatically injects agent orchestration guidance. Forces the main agent to leverage all available specialized agents (exploration, librarian, planning, UI) via background tasks in parallel, with strict TODO tracking and verification protocols.
 - **Anthropic Auto Compact**: Automatically compacts conversation history when approaching context limits for Anthropic models.
 - **Empty Task Response Detector**: Detects when subagent tasks return empty or meaningless responses and handles gracefully.
 - **Grep Output Truncator**: Prevents grep output from overwhelming the context by truncating excessively long results.
@@ -342,7 +362,7 @@ You can disable specific built-in hooks using `disabled_hooks` in `~/.config/ope
 }
 ```

-Available hooks: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`
+Available hooks: `todo-continuation-enforcer`, `context-window-monitor`, `session-recovery`, `session-notification`, `comment-checker`, `grep-output-truncator`, `directory-agents-injector`, `directory-readme-injector`, `empty-task-response-detector`, `think-mode`, `ultrawork-mode`, `anthropic-auto-compact`, `rules-injector`, `background-notification`, `auto-update-checker`

 > **Note**: `disabled_hooks` controls Oh My OpenCode's built-in hooks. To disable Claude Code's `settings.json` hooks, use `claude_code.hooks: false` instead (see [Compatibility Toggles](#compatibility-toggles)).

--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
  "name": "oh-my-opencode",
-  "version": "1.0.0",
+  "version": "1.0.2",
  "description": "OpenCode plugin - custom agents (oracle, librarian) and enhanced features",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
--- a/src/agents/index.ts
+++ b/src/agents/index.ts
@@ -4,6 +4,7 @@ import { librarianAgent } from "./librarian"
 import { exploreAgent } from "./explore"
 import { frontendUiUxEngineerAgent } from "./frontend-ui-ux-engineer"
 import { documentWriterAgent } from "./document-writer"
+import { multimodalLookerAgent } from "./multimodal-looker"

 export const builtinAgents: Record<string, AgentConfig> = {
  oracle: oracleAgent,
@@ -11,6 +12,7 @@ export const builtinAgents: Record<string, AgentConfig> = {
  explore: exploreAgent,
  "frontend-ui-ux-engineer": frontendUiUxEngineerAgent,
  "document-writer": documentWriterAgent,
+  "multimodal-looker": multimodalLookerAgent,
 }

 export * from "./types"
--- a/src/agents/multimodal-looker.ts
+++ b/src/agents/multimodal-looker.ts
@@ -0,0 +1,42 @@
+import type { AgentConfig } from "@opencode-ai/sdk"
+
+export const multimodalLookerAgent: AgentConfig = {
+  description:
+    "Analyze media files (PDFs, images, diagrams) that require interpretation beyond raw text. Extracts specific information or summaries from documents, describes visual content. Use when you need analyzed/extracted data rather than literal file contents.",
+  mode: "subagent",
+  model: "google/gemini-2.5-flash",
+  temperature: 0.1,
+  tools: { Read: true },
+  prompt: `You interpret media files that cannot be read as plain text.
+
+Your job: examine the attached file and extract ONLY what was requested.
+
+When to use you:
+- Media files the Read tool cannot interpret
+- Extracting specific information or summaries from documents
+- Describing visual content in images or diagrams
+- When analyzed/extracted data is needed, not raw file contents
+
+When NOT to use you:
+- Source code or plain text files needing exact contents (use Read)
+- Files that need editing afterward (need literal content from Read)
+- Simple file reading where no interpretation is needed
+
+How you work:
+1. Receive a file path and a goal describing what to extract
+2. Read and analyze the file deeply
+3. Return ONLY the relevant extracted information
+4. The main agent never processes the raw file - you save context tokens
+
+For PDFs: extract text, structure, tables, data from specific sections
+For images: describe layouts, UI elements, text, diagrams, charts
+For diagrams: explain relationships, flows, architecture depicted
+
+Response rules:
+- Return extracted information directly, no preamble
+- If info not found, state clearly what's missing
+- Match the language of the request
+- Be thorough on the goal, concise on everything else
+
+Your output goes straight to the main agent for continued work.`,
+}
--- a/src/agents/types.ts
+++ b/src/agents/types.ts
@@ -6,6 +6,7 @@ export type AgentName =
  | "explore"
  | "frontend-ui-ux-engineer"
  | "document-writer"
+  | "multimodal-looker"

 export type AgentOverrideConfig = Partial<AgentConfig>

--- a/src/agents/utils.ts
+++ b/src/agents/utils.ts
@@ -5,6 +5,7 @@ import { librarianAgent } from "./librarian"
 import { exploreAgent } from "./explore"
 import { frontendUiUxEngineerAgent } from "./frontend-ui-ux-engineer"
 import { documentWriterAgent } from "./document-writer"
+import { multimodalLookerAgent } from "./multimodal-looker"
 import { deepMerge } from "../shared"

 const allBuiltinAgents: Record<AgentName, AgentConfig> = {
@@ -13,6 +14,7 @@ const allBuiltinAgents: Record<AgentName, AgentConfig> = {
  explore: exploreAgent,
  "frontend-ui-ux-engineer": frontendUiUxEngineerAgent,
  "document-writer": documentWriterAgent,
+  "multimodal-looker": multimodalLookerAgent,
 }

 function mergeAgentConfig(
--- a/src/features/claude-code-agent-loader/loader.ts
+++ b/src/features/claude-code-agent-loader/loader.ts
@@ -3,6 +3,7 @@ import { homedir } from "os"
 import { join, basename } from "path"
 import type { AgentConfig } from "@opencode-ai/sdk"
 import { parseFrontmatter } from "../../shared/frontmatter"
+import { isMarkdownFile } from "../../shared/file-utils"
 import type { AgentScope, AgentFrontmatter, LoadedAgent } from "./types"

 function parseToolsConfig(toolsStr?: string): Record<string, boolean> | undefined {
@@ -18,10 +19,6 @@ function parseToolsConfig(toolsStr?: string): Record<string, boolean> | undefine
  return result
 }

-function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
-  return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
-}
-
 function loadAgentsFromDir(agentsDir: string, scope: AgentScope): LoadedAgent[] {
  if (!existsSync(agentsDir)) {
    return []
--- a/src/features/claude-code-command-loader/loader.ts
+++ b/src/features/claude-code-command-loader/loader.ts
@@ -3,12 +3,9 @@ import { homedir } from "os"
 import { join, basename } from "path"
 import { parseFrontmatter } from "../../shared/frontmatter"
 import { sanitizeModelField } from "../../shared/model-sanitizer"
+import { isMarkdownFile } from "../../shared/file-utils"
 import type { CommandScope, CommandDefinition, CommandFrontmatter, LoadedCommand } from "./types"

-function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
-  return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
-}
-
 function loadCommandsFromDir(commandsDir: string, scope: CommandScope): LoadedCommand[] {
  if (!existsSync(commandsDir)) {
    return []
--- a/src/features/claude-code-skill-loader/loader.ts
+++ b/src/features/claude-code-skill-loader/loader.ts
@@ -1,8 +1,9 @@
-import { existsSync, readdirSync, readFileSync, statSync, readlinkSync } from "fs"
+import { existsSync, readdirSync, readFileSync } from "fs"
 import { homedir } from "os"
-import { join, resolve } from "path"
+import { join } from "path"
 import { parseFrontmatter } from "../../shared/frontmatter"
 import { sanitizeModelField } from "../../shared/model-sanitizer"
+import { resolveSymlink } from "../../shared/file-utils"
 import type { CommandDefinition } from "../claude-code-command-loader/types"
 import type { SkillScope, SkillMetadata, LoadedSkillAsCommand } from "./types"

@@ -21,10 +22,7 @@ function loadSkillsFromDir(skillsDir: string, scope: SkillScope): LoadedSkillAsC

    if (!entry.isDirectory() && !entry.isSymbolicLink()) continue

-    let resolvedPath = skillPath
-    if (statSync(skillPath, { throwIfNoEntry: false })?.isSymbolicLink()) {
-      resolvedPath = resolve(skillPath, "..", readlinkSync(skillPath))
-    }
+    const resolvedPath = resolveSymlink(skillPath)

    const skillMdPath = join(resolvedPath, "SKILL.md")
    if (!existsSync(skillMdPath)) continue
--- a/src/hooks/think-mode/index.ts
+++ b/src/hooks/think-mode/index.ts
@@ -1,6 +1,7 @@
 import { detectThinkKeyword, extractPromptText } from "./detector"
-import { getHighVariant, isAlreadyHighVariant } from "./switcher"
+import { getHighVariant, isAlreadyHighVariant, getThinkingConfig } from "./switcher"
 import type { ThinkModeState, ThinkModeInput } from "./types"
+import { log } from "../../shared"

 export * from "./detector"
 export * from "./switcher"
@@ -23,6 +24,7 @@ export function createThinkModeHook() {
      const state: ThinkModeState = {
        requested: false,
        modelSwitched: false,
+        thinkingConfigInjected: false,
      }

      if (!detectThinkKeyword(promptText)) {
@@ -47,17 +49,31 @@ export function createThinkModeHook() {
      }

      const highVariant = getHighVariant(currentModel.modelID)
+      const thinkingConfig = getThinkingConfig(currentModel.providerID, currentModel.modelID)

-      if (!highVariant) {
-        thinkModeState.set(sessionID, state)
-        return
+      if (highVariant) {
+        output.message.model = {
+          providerID: currentModel.providerID,
+          modelID: highVariant,
+        }
+        state.modelSwitched = true
+        log("Think mode: model switched to high variant", {
+          sessionID,
+          from: currentModel.modelID,
+          to: highVariant,
+        })
      }

-      output.message.model = {
-        providerID: currentModel.providerID,
-        modelID: highVariant,
+      if (thinkingConfig) {
+        Object.assign(output.message, thinkingConfig)
+        state.thinkingConfigInjected = true
+        log("Think mode: thinking config injected", {
+          sessionID,
+          provider: currentModel.providerID,
+          config: thinkingConfig,
+        })
      }
-      state.modelSwitched = true
+
      thinkModeState.set(sessionID, state)
    },

--- a/src/hooks/think-mode/switcher.ts
+++ b/src/hooks/think-mode/switcher.ts
@@ -55,12 +55,14 @@ export const THINKING_CONFIGS: Record<string, Record<string, unknown>> = {
      type: "enabled",
      budgetTokens: 64000,
    },
+    maxTokens: 128000,
  },
  "amazon-bedrock": {
    reasoningConfig: {
      type: "enabled",
      budgetTokens: 32000,
    },
+    maxTokens: 64000,
  },
  google: {
    providerOptions: {
--- a/src/hooks/think-mode/types.ts
+++ b/src/hooks/think-mode/types.ts
@@ -1,6 +1,7 @@
 export interface ThinkModeState {
  requested: boolean
  modelSwitched: boolean
+  thinkingConfigInjected: boolean
  providerID?: string
  modelID?: string
 }
--- a/src/hooks/ultrawork-mode/index.ts
+++ b/src/hooks/ultrawork-mode/index.ts
@@ -2,6 +2,7 @@ import { detectUltraworkKeyword, extractPromptText } from "./detector"
 import { ULTRAWORK_CONTEXT } from "./constants"
 import type { UltraworkModeState } from "./types"
 import { log } from "../../shared"
+import { injectHookMessage } from "../../features/hook-message-injector"

 export * from "./detector"
 export * from "./constants"
@@ -16,13 +17,13 @@ export function clearUltraworkModeState(sessionID: string): void {
 export function createUltraworkModeHook() {
  return {
    /**
-     * chat.message hook - detect ultrawork/ulw keywords, inject context
+     * chat.message hook - detect ultrawork/ulw keywords, inject context via history
     *
     * Execution timing: AFTER claudeCodeHooks["chat.message"]
     * Behavior:
     *   1. Extract text from user prompt
     *   2. Detect ultrawork/ulw keywords (excluding code blocks)
-     *   3. If detected, prepend ULTRAWORK_CONTEXT to first text part
+     *   3. If detected, inject ULTRAWORK_CONTEXT via injectHookMessage (history injection)
     */
    "chat.message": async (
      input: {
@@ -51,13 +52,25 @@ export function createUltraworkModeHook() {
      state.detected = true
      log("Ultrawork keyword detected", { sessionID: input.sessionID })

-      const parts = output.parts as Array<{ type: string; text?: string }>
-      const idx = parts.findIndex((p) => p.type === "text" && p.text)
+      const message = output.message as {
+        agent?: string
+        model?: { modelID?: string; providerID?: string }
+        path?: { cwd?: string; root?: string }
+        tools?: Record<string, boolean>
+      }

-      if (idx >= 0) {
-        parts[idx].text = `${ULTRAWORK_CONTEXT}${parts[idx].text ?? ""}`
+      const success = injectHookMessage(input.sessionID, ULTRAWORK_CONTEXT, {
+        agent: message.agent,
+        model: message.model,
+        path: message.path,
+        tools: message.tools,
+      })
+
+      if (success) {
        state.injected = true
-        log("Ultrawork context injected", { sessionID: input.sessionID })
+        log("Ultrawork context injected via history", { sessionID: input.sessionID })
+      } else {
+        log("Ultrawork context injection failed", { sessionID: input.sessionID })
      }

      ultraworkModeState.set(input.sessionID, state)
--- a/src/index.ts
+++ b/src/index.ts
@@ -41,7 +41,7 @@ import {
  getCurrentSessionTitle,
 } from "./features/claude-code-session-state";
 import { updateTerminalTitle } from "./features/terminal";
-import { builtinTools, createCallOmoAgent, createBackgroundTools } from "./tools";
+import { builtinTools, createCallOmoAgent, createBackgroundTools, createLookAt } from "./tools";
 import { BackgroundManager } from "./features/background-agent";
 import { createBuiltinMcps } from "./mcp";
 import { OhMyOpenCodeConfigSchema, type OhMyOpenCodeConfig, type HookName } from "./config";
@@ -218,6 +218,7 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
  const backgroundTools = createBackgroundTools(backgroundManager, ctx.client);

  const callOmoAgent = createCallOmoAgent(ctx, backgroundManager);
+  const lookAt = createLookAt(ctx);

  const googleAuthHooks = pluginConfig.google_auth
    ? await createGoogleAntigravityAuthPlugin(ctx)
@@ -230,6 +231,7 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
      ...builtinTools,
      ...backgroundTools,
      call_omo_agent: callOmoAgent,
+      look_at: lookAt,
    },

    "chat.message": async (input, output) => {
@@ -268,6 +270,14 @@ const OhMyOpenCodePlugin: Plugin = async (ctx) => {
          call_omo_agent: false,
        };
      }
+      if (config.agent["multimodal-looker"]) {
+        config.agent["multimodal-looker"].tools = {
+          ...config.agent["multimodal-looker"].tools,
+          task: false,
+          call_omo_agent: false,
+          look_at: false,
+        };
+      }

      const mcpResult = (pluginConfig.claude_code?.mcp ?? true)
        ? await loadMcpConfigs()
--- a/src/shared/deep-merge.ts
+++ b/src/shared/deep-merge.ts
@@ -1,7 +1,7 @@
 const DANGEROUS_KEYS = new Set(["__proto__", "constructor", "prototype"]);
 const MAX_DEPTH = 50;

-function isPlainObject(value: unknown): value is Record<string, unknown> {
+export function isPlainObject(value: unknown): value is Record<string, unknown> {
  return (
    typeof value === "object" &&
    value !== null &&
--- a/src/shared/file-utils.ts
+++ b/src/shared/file-utils.ts
@@ -0,0 +1,26 @@
+import { lstatSync, readlinkSync } from "fs"
+import { resolve } from "path"
+
+export function isMarkdownFile(entry: { name: string; isFile: () => boolean }): boolean {
+  return !entry.name.startsWith(".") && entry.name.endsWith(".md") && entry.isFile()
+}
+
+export function isSymbolicLink(filePath: string): boolean {
+  try {
+    return lstatSync(filePath, { throwIfNoEntry: false })?.isSymbolicLink() ?? false
+  } catch {
+    return false
+  }
+}
+
+export function resolveSymlink(filePath: string): string {
+  try {
+    const stats = lstatSync(filePath, { throwIfNoEntry: false })
+    if (stats?.isSymbolicLink()) {
+      return resolve(filePath, "..", readlinkSync(filePath))
+    }
+    return filePath
+  } catch {
+    return filePath
+  }
+}
--- a/src/shared/index.ts
+++ b/src/shared/index.ts
@@ -8,3 +8,4 @@ export * from "./tool-name"
 export * from "./pattern-matcher"
 export * from "./hook-disabled"
 export * from "./deep-merge"
+export * from "./file-utils"
--- a/src/shared/snake-case.ts
+++ b/src/shared/snake-case.ts
@@ -1,3 +1,5 @@
+import { isPlainObject } from "./deep-merge"
+
 export function camelToSnake(str: string): string {
  return str.replace(/[A-Z]/g, (letter) => `_${letter.toLowerCase()}`)
 }
@@ -6,10 +8,6 @@ export function snakeToCamel(str: string): string {
  return str.replace(/_([a-z])/g, (_, letter) => letter.toUpperCase())
 }

-function isPlainObject(value: unknown): value is Record<string, unknown> {
-  return typeof value === "object" && value !== null && !Array.isArray(value)
-}
-
 export function objectToSnakeCase(
  obj: Record<string, unknown>,
  deep: boolean = true
--- a/src/tools/index.ts
+++ b/src/tools/index.ts
@@ -34,6 +34,7 @@ import type { BackgroundManager } from "../features/background-agent"
 type OpencodeClient = PluginInput["client"]

 export { createCallOmoAgent } from "./call-omo-agent"
+export { createLookAt } from "./look-at"

 export function createBackgroundTools(manager: BackgroundManager, client: OpencodeClient) {
  return {
--- a/src/tools/look-at/constants.ts
+++ b/src/tools/look-at/constants.ts
@@ -0,0 +1,23 @@
+export const MULTIMODAL_LOOKER_AGENT = "multimodal-looker" as const
+
+export const LOOK_AT_DESCRIPTION = `Analyze media files (PDFs, images, diagrams) that require visual interpretation.
+
+Use this tool to extract specific information from files that cannot be processed as plain text:
+- PDF documents: extract text, tables, structure, specific sections
+- Images: describe layouts, UI elements, text content, diagrams
+- Charts/Graphs: explain data, trends, relationships
+- Screenshots: identify UI components, text, visual elements
+- Architecture diagrams: explain flows, connections, components
+
+Parameters:
+- file_path: Absolute path to the file to analyze
+- goal: What specific information to extract (be specific for better results)
+
+Examples:
+- "Extract all API endpoints from this OpenAPI spec PDF"
+- "Describe the UI layout and components in this screenshot"
+- "Explain the data flow in this architecture diagram"
+- "List all table data from page 3 of this PDF"
+
+This tool uses a separate context window with Gemini 2.5 Flash for multimodal analysis,
+saving tokens in the main conversation while providing accurate visual interpretation.`
--- a/src/tools/look-at/index.ts
+++ b/src/tools/look-at/index.ts
@@ -0,0 +1,3 @@
+export * from "./types"
+export * from "./constants"
+export { createLookAt } from "./tools"
--- a/src/tools/look-at/tools.ts
+++ b/src/tools/look-at/tools.ts
@@ -0,0 +1,91 @@
+import { tool, type PluginInput } from "@opencode-ai/plugin"
+import { LOOK_AT_DESCRIPTION, MULTIMODAL_LOOKER_AGENT } from "./constants"
+import type { LookAtArgs } from "./types"
+import { log } from "../../shared/logger"
+
+export function createLookAt(ctx: PluginInput) {
+  return tool({
+    description: LOOK_AT_DESCRIPTION,
+    args: {
+      file_path: tool.schema.string().describe("Absolute path to the file to analyze"),
+      goal: tool.schema.string().describe("What specific information to extract from the file"),
+    },
+    async execute(args: LookAtArgs, toolContext) {
+      log(`[look_at] Analyzing file: ${args.file_path}, goal: ${args.goal}`)
+
+      const prompt = `Analyze this file and extract the requested information.
+
+File path: ${args.file_path}
+Goal: ${args.goal}
+
+Read the file using the Read tool, then provide ONLY the extracted information that matches the goal.
+Be thorough on what was requested, concise on everything else.
+If the requested information is not found, clearly state what is missing.`
+
+      log(`[look_at] Creating session with parent: ${toolContext.sessionID}`)
+      const createResult = await ctx.client.session.create({
+        body: {
+          parentID: toolContext.sessionID,
+          title: `look_at: ${args.goal.substring(0, 50)}`,
+        },
+      })
+
+      if (createResult.error) {
+        log(`[look_at] Session create error:`, createResult.error)
+        return `Error: Failed to create session: ${createResult.error}`
+      }
+
+      const sessionID = createResult.data.id
+      log(`[look_at] Created session: ${sessionID}`)
+
+      log(`[look_at] Sending prompt to session ${sessionID}`)
+      await ctx.client.session.prompt({
+        path: { id: sessionID },
+        body: {
+          agent: MULTIMODAL_LOOKER_AGENT,
+          tools: {
+            task: false,
+            call_omo_agent: false,
+            look_at: false,
+          },
+          parts: [{ type: "text", text: prompt }],
+        },
+      })
+
+      log(`[look_at] Prompt sent, fetching messages...`)
+
+      const messagesResult = await ctx.client.session.messages({
+        path: { id: sessionID },
+      })
+
+      if (messagesResult.error) {
+        log(`[look_at] Messages error:`, messagesResult.error)
+        return `Error: Failed to get messages: ${messagesResult.error}`
+      }
+
+      const messages = messagesResult.data
+      log(`[look_at] Got ${messages.length} messages`)
+
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const lastAssistantMessage = messages
+        .filter((m: any) => m.info.role === "assistant")
+        .sort((a: any, b: any) => (b.info.time?.created || 0) - (a.info.time?.created || 0))[0]
+
+      if (!lastAssistantMessage) {
+        log(`[look_at] No assistant message found`)
+        return `Error: No response from multimodal-looker agent`
+      }
+
+      log(`[look_at] Found assistant message with ${lastAssistantMessage.parts.length} parts`)
+
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const textParts = lastAssistantMessage.parts.filter((p: any) => p.type === "text")
+      // eslint-disable-next-line @typescript-eslint/no-explicit-any
+      const responseText = textParts.map((p: any) => p.text).join("\n")
+
+      log(`[look_at] Got response, length: ${responseText.length}`)
+
+      return responseText
+    },
+  })
+}
--- a/src/tools/look-at/types.ts
+++ b/src/tools/look-at/types.ts
@@ -0,0 +1,4 @@
+export interface LookAtArgs {
+  file_path: string
+  goal: string
+}
--- a/src/tools/skill/tools.ts
+++ b/src/tools/skill/tools.ts
@@ -1,9 +1,10 @@
 import { tool } from "@opencode-ai/plugin"
-import { existsSync, readdirSync, statSync, readlinkSync, readFileSync } from "fs"
+import { existsSync, readdirSync, readFileSync } from "fs"
 import { homedir } from "os"
-import { join, resolve, basename } from "path"
+import { join, basename } from "path"
 import { z } from "zod/v4"
 import { parseFrontmatter, resolveCommandsInText } from "../../shared"
+import { resolveSymlink } from "../../shared/file-utils"
 import { SkillFrontmatterSchema } from "./types"
 import type { SkillScope, SkillMetadata, SkillInfo, LoadedSkill, SkillFrontmatter } from "./types"

@@ -37,15 +38,7 @@ function discoverSkillsFromDir(
    const skillPath = join(skillsDir, entry.name)

    if (entry.isDirectory() || entry.isSymbolicLink()) {
-      let resolvedPath = skillPath
-      try {
-        const stats = statSync(skillPath, { throwIfNoEntry: false })
-        if (stats?.isSymbolicLink()) {
-          resolvedPath = resolve(skillPath, "..", readlinkSync(skillPath))
-        }
-      } catch {
-        continue
-      }
+      const resolvedPath = resolveSymlink(skillPath)

      const skillMdPath = join(resolvedPath, "SKILL.md")
      if (!existsSync(skillMdPath)) continue
@@ -83,18 +76,6 @@ const skillListForDescription = availableSkills
  .map((s) => `- ${s.name}: ${s.description} (${s.scope})`)
  .join("\n")

-function resolveSymlink(skillPath: string): string {
-  try {
-    const stats = statSync(skillPath, { throwIfNoEntry: false })
-    if (stats?.isSymbolicLink()) {
-      return resolve(skillPath, "..", readlinkSync(skillPath))
-    }
-    return skillPath
-  } catch {
-    return skillPath
-  }
-}
-
 async function parseSkillMd(skillPath: string): Promise<SkillInfo | null> {
  const resolvedPath = resolveSymlink(skillPath)
  const skillMdPath = join(resolvedPath, "SKILL.md")
--- a/src/tools/slashcommand/tools.ts
+++ b/src/tools/slashcommand/tools.ts
@@ -3,6 +3,7 @@ import { existsSync, readdirSync, readFileSync } from "fs"
 import { homedir } from "os"
 import { join, basename, dirname } from "path"
 import { parseFrontmatter, resolveCommandsInText, resolveFileReferencesInText, sanitizeModelField } from "../../shared"
+import { isMarkdownFile } from "../../shared/file-utils"
 import type { CommandScope, CommandMetadata, CommandInfo } from "./types"

 function discoverCommandsFromDir(commandsDir: string, scope: CommandScope): CommandInfo[] {
@@ -14,9 +15,7 @@ function discoverCommandsFromDir(commandsDir: string, scope: CommandScope): Comm
  const commands: CommandInfo[] = []

  for (const entry of entries) {
-    if (entry.name.startsWith(".")) continue
-    if (!entry.name.endsWith(".md")) continue
-    if (!entry.isFile()) continue
+    if (!isMarkdownFile(entry)) continue

    const commandPath = join(commandsDir, entry.name)
    const commandName = basename(entry.name, ".md")
Author	SHA1	Message	Date
YeonGyu-Kim	96886f18ac	docs: add look_at tool and multimodal-looker agent documentation 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 15:28:59 +09:00
YeonGyu-Kim	a3938e8c25	feat: add look_at tool and multimodal-looker agent Add a new tool and agent for analyzing media files (PDFs, images, diagrams) that require visual interpretation beyond raw text. - Add `multimodal-looker` agent using Gemini 2.5 Flash model - Add `look_at` tool that spawns multimodal-looker sessions - Restrict multimodal-looker from calling task/call_omo_agent/look_at tools Inspired by Sourcegraph Ampcode's look_at tool design. 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 15:28:59 +09:00
YeonGyu-Kim	821b0b8e9f	docs: add known issue and hotfix for opencode-openai-codex-auth 400 error 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 15:28:59 +09:00
Junho Yeo	356bd1dff3	fix(ci): prevent publish workflow from running on forks (#34 )	2025-12-13 14:48:18 +09:00
github-actions[bot]	f2b070cd0b	release: v1.0.2	2025-12-13 05:24:38 +00:00
Junho Yeo	1323443c85	refactor: extract shared utilities (`isMarkdownFile`, `isPlainObject`, `resolveSymlink`) (#33 )	2025-12-13 14:23:04 +09:00
github-actions[bot]	60d9513d3a	release: v1.0.1	2025-12-13 05:06:31 +00:00
YeonGyu-Kim	55bc8f08df	refactor(ultrawork-mode): use history injection instead of direct message modification - Replace direct parts[idx].text modification with injectHookMessage - Context now injected via filesystem (like UserPromptSubmitHook) - Preserves original user message without modification 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 14:05:17 +09:00
YeonGyu-Kim	0ac4d223f9	feat(think-mode): inject thinking config with maxTokens for extended thinking - Actually inject THINKING_CONFIGS into message (was defined but unused) - Add maxTokens: 128000 for Anthropic (required for extended thinking) - Add maxTokens: 64000 for Amazon Bedrock - Track thinkingConfigInjected state 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 14:05:02 +09:00
YeonGyu-Kim	19b3690499	docs: add Ultrawork Mode hook documentation 🤖 GENERATED WITH ASSISTANCE OF [OhMyOpenCode](https://github.com/code-yeongyu/oh-my-opencode)	2025-12-13 14:02:10 +09:00
Junho Yeo	564c8ae8bf	fix: use `lstatSync` instead of `statSync` for symlink detection (#32 )	2025-12-13 13:58:02 +09:00