update agent-browser skill to match upstream v0.16.3

Sync SKILL.md and inline template with vercel-labs/agent-browser v0.16.3.
Adds: native Rust daemon, diff commands, annotated screenshots, profiler,
keyboard type/inserttext, get styles, expanded locators (placeholder/alt/
title/testid/last), security options, config file support, iOS Simulator,
cloud providers (Browserbase/Browser Use/Kernel), session persistence,
CDP auto-connect, and state management commands.
This commit is contained in:
YeonGyu-Kim
2026-03-05 17:31:12 +09:00
parent 5137df72d8
commit 89dc302403
2 changed files with 367 additions and 99 deletions

View File

@@ -26,29 +26,35 @@ agent-browser close # Close browser
### Navigation
```bash
agent-browser open <url> # Navigate to URL
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browser
agent-browser close # Close browser (aliases: quit, exit)
```
### Snapshot (page analysis)
```bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -c # Compact output
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
```
The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
### Interactions (use @refs from snapshot)
```bash
agent-browser click @e1 # Click
agent-browser click @e1 # Click (--new-tab to open in new tab)
agent-browser dblclick @e1 # Double-click
agent-browser focus @e1 # Focus element
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser keyboard type "text" # Type with real keystrokes (no selector, current focus)
agent-browser keyboard inserttext "text" # Insert text without key events (no selector)
agent-browser press Enter # Press key
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
@@ -57,8 +63,8 @@ agent-browser hover @e1 # Hover
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "value" # Select dropdown
agent-browser scroll down 500 # Scroll page
agent-browser scrollintoview @e1 # Scroll element into view
agent-browser scroll down 500 # Scroll page (--selector <sel> for container)
agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload files
```
@@ -73,6 +79,7 @@ agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count ".item" # Count matching elements
agent-browser get box @e1 # Get bounding box
agent-browser get styles @e1 # Get computed styles
```
### Check state
@@ -84,12 +91,20 @@ agent-browser is checked @e1 # Check if checked
### Screenshots & PDF
```bash
agent-browser screenshot # Screenshot to stdout
agent-browser screenshot # Screenshot (saves to temp dir if no path)
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full # Full page
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
```
Annotated screenshots overlay numbered labels `[N]` on interactive elements. Each label corresponds to ref `@eN`, so refs work for both visual and text workflows:
```bash
agent-browser screenshot --annotate ./page.png
# Output: [1] @e1 button "Submit", [2] @e2 link "Home", [3] @e3 textbox "Email"
agent-browser click @e2 # Click the "Home" link labeled [2]
```
### Video recording
```bash
agent-browser record start ./demo.webm # Start recording (uses current URL + state)
@@ -109,10 +124,12 @@ agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --fn "window.ready" # Wait for JS condition
```
Load states: `load`, `domcontentloaded`, `networkidle`
### Mouse control
```bash
agent-browser mouse move 100 200 # Move mouse
agent-browser mouse down left # Press button
agent-browser mouse down left # Press button (left/right/middle)
agent-browser mouse up left # Release button
agent-browser mouse wheel 100 # Scroll wheel
```
@@ -122,10 +139,18 @@ agent-browser mouse wheel 100 # Scroll wheel
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search..." fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" text
```
Actions: `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
Options: `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
### Browser settings
```bash
agent-browser set viewport 1920 1080 # Set viewport size
@@ -142,14 +167,13 @@ agent-browser set media dark # Emulate color scheme
agent-browser cookies # Get all cookies
agent-browser cookies set name value # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local key # Get specific key
agent-browser storage local set k v # Set value
agent-browser storage local clear # Clear all
agent-browser storage session # Get all sessionStorage
agent-browser storage session key # Get specific key
agent-browser storage session set k v # Set value
agent-browser storage session clear # Clear all
agent-browser storage session # Same for sessionStorage
```
### Network
@@ -179,13 +203,59 @@ agent-browser frame main # Back to main frame
### Dialogs
```bash
agent-browser dialog accept [text] # Accept dialog
agent-browser dialog accept [text] # Accept dialog (with optional prompt text)
agent-browser dialog dismiss # Dismiss dialog
```
### Diff (compare snapshots, screenshots, URLs)
```bash
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
```
### JavaScript
```bash
agent-browser eval "document.title" # Run JavaScript
agent-browser eval -b "base64code" # Run base64-encoded JS
agent-browser eval --stdin # Read JS from stdin
```
### Debug & Profiling
```bash
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser profiler start # Start Chrome DevTools profiling
agent-browser profiler stop profile.json # Stop and save profile
```
### State management
```bash
agent-browser state save auth.json # Save auth state
agent-browser state load auth.json # Load auth state
agent-browser state list # List saved state files
agent-browser state show <file> # Show state summary
agent-browser state rename <old> <new> # Rename state file
agent-browser state clear [name] # Clear states for session
agent-browser state clear --all # Clear all saved states
agent-browser state clean --older-than <days> # Delete old states
```
### Setup
```bash
agent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)
```
## Global Options
@@ -193,19 +263,60 @@ agent-browser eval "document.title" # Run JavaScript
| Option | Description |
|--------|-------------|
| `--session <name>` | Isolated browser session (`AGENT_BROWSER_SESSION` env) |
| `--session-name <name>` | Auto-save/restore session state (`AGENT_BROWSER_SESSION_NAME` env) |
| `--profile <path>` | Persistent browser profile (`AGENT_BROWSER_PROFILE` env) |
| `--state <path>` | Load storage state from JSON file (`AGENT_BROWSER_STATE` env) |
| `--headers <json>` | HTTP headers scoped to URL's origin |
| `--executable-path <path>` | Custom browser binary (`AGENT_BROWSER_EXECUTABLE_PATH` env) |
| `--extension <path>` | Load browser extension (repeatable; `AGENT_BROWSER_EXTENSIONS` env) |
| `--args <args>` | Browser launch args (`AGENT_BROWSER_ARGS` env) |
| `--user-agent <ua>` | Custom User-Agent (`AGENT_BROWSER_USER_AGENT` env) |
| `--proxy <url>` | Proxy server (`AGENT_BROWSER_PROXY` env) |
| `--proxy-bypass <hosts>` | Hosts to bypass proxy (`AGENT_BROWSER_PROXY_BYPASS` env) |
| `--ignore-https-errors` | Ignore HTTPS certificate errors |
| `--allow-file-access` | Allow file:// URLs to access local files |
| `-p, --provider <name>` | Cloud browser provider (`AGENT_BROWSER_PROVIDER` env) |
| `--device <name>` | iOS device name (`AGENT_BROWSER_IOS_DEVICE` env) |
| `--json` | Machine-readable JSON output |
| `--headed` | Show browser window (not headless) |
| `--full, -f` | Full page screenshot |
| `--annotate` | Annotated screenshot with numbered labels (`AGENT_BROWSER_ANNOTATE` env) |
| `--headed` | Show browser window (`AGENT_BROWSER_HEADED` env) |
| `--cdp <port\|wss://url>` | Connect via Chrome DevTools Protocol |
| `--auto-connect` | Auto-discover running Chrome (`AGENT_BROWSER_AUTO_CONNECT` env) |
| `--color-scheme <scheme>` | Color scheme: dark, light, no-preference (`AGENT_BROWSER_COLOR_SCHEME` env) |
| `--download-path <path>` | Default download directory (`AGENT_BROWSER_DOWNLOAD_PATH` env) |
| `--native` | [Experimental] Use native Rust daemon (`AGENT_BROWSER_NATIVE` env) |
| `--config <path>` | Custom config file (`AGENT_BROWSER_CONFIG` env) |
| `--debug` | Debug output |
### Security options
| Option | Description |
|--------|-------------|
| `--content-boundaries` | Wrap page output in boundary markers (`AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
| `--max-output <chars>` | Truncate page output to N characters (`AGENT_BROWSER_MAX_OUTPUT` env) |
| `--allowed-domains <list>` | Comma-separated allowed domain patterns (`AGENT_BROWSER_ALLOWED_DOMAINS` env) |
| `--action-policy <path>` | Path to action policy JSON file (`AGENT_BROWSER_ACTION_POLICY` env) |
| `--confirm-actions <list>` | Action categories requiring confirmation (`AGENT_BROWSER_CONFIRM_ACTIONS` env) |
## Configuration file
Create `agent-browser.json` for persistent defaults (no need to repeat flags):
**Locations (lowest to highest priority):**
1. `~/.agent-browser/config.json` — user-level defaults
2. `./agent-browser.json` — project-level overrides
3. `AGENT_BROWSER_*` environment variables
4. CLI flags override everything
```json
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"native": true
}
```
## Example: Form submission
```bash
@@ -247,6 +358,13 @@ agent-browser open other-site.com
agent-browser set headers '{"X-Custom-Header": "value"}'
```
### Authentication Vault
```bash
# Store credentials locally (encrypted). The LLM never sees passwords.
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
```
## Sessions & Persistent Profiles
### Sessions (parallel browsers)
@@ -256,6 +374,13 @@ agent-browser --session test2 open site-b.com
agent-browser session list
```
### Session persistence (auto-save/restore)
```bash
agent-browser --session-name twitter open twitter.com
# Login once, state persists automatically across restarts
# State files stored in ~/.agent-browser/sessions/
```
### Persistent Profiles
Persists cookies, localStorage, IndexedDB, service workers, cache, login sessions across browser restarts.
```bash
@@ -263,9 +388,6 @@ agent-browser --profile ~/.myapp-profile open myapp.com
# Or via env var
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
```
- Use different profile paths for different projects
- Login once → restart browser → still logged in
- Stores: cookies, localStorage, IndexedDB, service workers, browser cache
## JSON output (for parsing)
@@ -275,62 +397,54 @@ agent-browser snapshot -i --json
agent-browser get text @e1 --json
```
## Debugging
## Local files
```bash
agent-browser open example.com --headed # Show browser window
agent-browser console # View console messages
agent-browser errors # View page errors
agent-browser record start ./debug.webm # Record from current page
agent-browser record stop # Save recording
agent-browser connect 9222 # Local CDP port
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
```
## CDP Mode
```bash
agent-browser connect 9222 # Local CDP port
agent-browser --cdp 9222 snapshot # Direct CDP on each command
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot # Remote via WebSocket
agent-browser console --clear # Clear console
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser --auto-connect snapshot # Auto-discover running Chrome
```
## Cloud providers
```bash
# Browserbase
BROWSERBASE_API_KEY="key" BROWSERBASE_PROJECT_ID="id" agent-browser -p browserbase open example.com
# Browser Use
BROWSER_USE_API_KEY="key" agent-browser -p browseruse open example.com
# Kernel
KERNEL_API_KEY="key" agent-browser -p kernel open example.com
```
## iOS Simulator
```bash
agent-browser device list # List available simulators
agent-browser -p ios --device "iPhone 16 Pro" open example.com # Launch Safari
agent-browser -p ios snapshot -i # Same commands as desktop
agent-browser -p ios tap @e1 # Tap
agent-browser -p ios swipe up # Mobile-specific
agent-browser -p ios close # Close session
```
## Native Mode (Experimental)
Pure Rust daemon using direct CDP — no Node.js/Playwright required:
```bash
agent-browser --native open example.com
# Or: export AGENT_BROWSER_NATIVE=1
# Or: {"native": true} in agent-browser.json
```
---
## Installation
### Step 1: Install agent-browser CLI
```bash
bun add -g agent-browser
```
### Step 2: Install Playwright browsers
**IMPORTANT**: `agent-browser install` may fail on some platforms (e.g., darwin-arm64) with "No binary found" error. In that case, install Playwright browsers directly:
```bash
# Create a temp project and install playwright
cd /tmp && bun init -y && bun add playwright
# Install Chromium browser
bun playwright install chromium
```
This downloads Chrome for Testing to `~/Library/Caches/ms-playwright/`.
### Verify installation
```bash
agent-browser open https://example.com --headed
```
If the browser opens successfully, installation is complete.
### Troubleshooting
| Error | Solution |
|-------|----------|
| `No binary found for darwin-arm64` | Run `bun playwright install chromium` in a project with playwright dependency |
| `Executable doesn't exist at .../chromium-XXXX` | Re-run `bun playwright install chromium` |
| Browser doesn't open | Ensure `--headed` flag is used for visible browser |
---
Run `agent-browser --help` for all commands. Repo: https://github.com/vercel-labs/agent-browser
Install: `bun add -g agent-browser && agent-browser install`. Run `agent-browser --help` for all commands. Repo: https://github.com/vercel-labs/agent-browser

View File

@@ -40,29 +40,35 @@ agent-browser close # Close browser
### Navigation
\`\`\`bash
agent-browser open <url> # Navigate to URL
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browser
agent-browser close # Close browser (aliases: quit, exit)
\`\`\`
### Snapshot (page analysis)
\`\`\`bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -c # Compact output
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
agent-browser snapshot -c # Compact (remove empty structural elements)
agent-browser snapshot -d 3 # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector
agent-browser snapshot -i -c -d 5 # Combine options
\`\`\`
The \`-C\` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
### Interactions (use @refs from snapshot)
\`\`\`bash
agent-browser click @e1 # Click
agent-browser click @e1 # Click (--new-tab to open in new tab)
agent-browser dblclick @e1 # Double-click
agent-browser focus @e1 # Focus element
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser keyboard type "text" # Type with real keystrokes (no selector, current focus)
agent-browser keyboard inserttext "text" # Insert text without key events (no selector)
agent-browser press Enter # Press key
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
@@ -71,8 +77,8 @@ agent-browser hover @e1 # Hover
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "value" # Select dropdown
agent-browser scroll down 500 # Scroll page
agent-browser scrollintoview @e1 # Scroll element into view
agent-browser scroll down 500 # Scroll page (--selector <sel> for container)
agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload files
\`\`\`
@@ -87,6 +93,7 @@ agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count ".item" # Count matching elements
agent-browser get box @e1 # Get bounding box
agent-browser get styles @e1 # Get computed styles
\`\`\`
### Check state
@@ -98,12 +105,20 @@ agent-browser is checked @e1 # Check if checked
### Screenshots & PDF
\`\`\`bash
agent-browser screenshot # Screenshot to stdout
agent-browser screenshot # Screenshot (saves to temp dir if no path)
agent-browser screenshot path.png # Save to file
agent-browser screenshot --full # Full page
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
\`\`\`
Annotated screenshots overlay numbered labels \`[N]\` on interactive elements. Each label corresponds to ref \`@eN\`, so refs work for both visual and text workflows:
\`\`\`bash
agent-browser screenshot --annotate ./page.png
# Output: [1] @e1 button "Submit", [2] @e2 link "Home", [3] @e3 textbox "Email"
agent-browser click @e2 # Click the "Home" link labeled [2]
\`\`\`
### Video recording
\`\`\`bash
agent-browser record start ./demo.webm # Start recording (uses current URL + state)
@@ -123,10 +138,12 @@ agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --fn "window.ready" # Wait for JS condition
\`\`\`
Load states: \`load\`, \`domcontentloaded\`, \`networkidle\`
### Mouse control
\`\`\`bash
agent-browser mouse move 100 200 # Move mouse
agent-browser mouse down left # Press button
agent-browser mouse down left # Press button (left/right/middle)
agent-browser mouse up left # Release button
agent-browser mouse wheel 100 # Scroll wheel
\`\`\`
@@ -136,10 +153,18 @@ agent-browser mouse wheel 100 # Scroll wheel
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search..." fill "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" text
\`\`\`
Actions: \`click\`, \`fill\`, \`type\`, \`hover\`, \`focus\`, \`check\`, \`uncheck\`, \`text\`
Options: \`--name <name>\` (filter role by accessible name), \`--exact\` (require exact text match)
### Browser settings
\`\`\`bash
agent-browser set viewport 1920 1080 # Set viewport size
@@ -156,14 +181,13 @@ agent-browser set media dark # Emulate color scheme
agent-browser cookies # Get all cookies
agent-browser cookies set name value # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local key # Get specific key
agent-browser storage local set k v # Set value
agent-browser storage local clear # Clear all
agent-browser storage session # Get all sessionStorage
agent-browser storage session key # Get specific key
agent-browser storage session set k v # Set value
agent-browser storage session clear # Clear all
agent-browser storage session # Same for sessionStorage
\`\`\`
### Network
@@ -193,13 +217,59 @@ agent-browser frame main # Back to main frame
### Dialogs
\`\`\`bash
agent-browser dialog accept [text] # Accept dialog
agent-browser dialog accept [text] # Accept dialog (with optional prompt text)
agent-browser dialog dismiss # Dismiss dialog
\`\`\`
### Diff (compare snapshots, screenshots, URLs)
\`\`\`bash
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
\`\`\`
### JavaScript
\`\`\`bash
agent-browser eval "document.title" # Run JavaScript
agent-browser eval -b "base64code" # Run base64-encoded JS
agent-browser eval --stdin # Read JS from stdin
\`\`\`
### Debug & Profiling
\`\`\`bash
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser profiler start # Start Chrome DevTools profiling
agent-browser profiler stop profile.json # Stop and save profile
\`\`\`
### State management
\`\`\`bash
agent-browser state save auth.json # Save auth state
agent-browser state load auth.json # Load auth state
agent-browser state list # List saved state files
agent-browser state show <file> # Show state summary
agent-browser state rename <old> <new> # Rename state file
agent-browser state clear [name] # Clear states for session
agent-browser state clear --all # Clear all saved states
agent-browser state clean --older-than <days> # Delete old states
\`\`\`
### Setup
\`\`\`bash
agent-browser install # Download Chromium browser
agent-browser install --with-deps # Also install system deps (Linux)
\`\`\`
## Global Options
@@ -207,19 +277,60 @@ agent-browser eval "document.title" # Run JavaScript
| Option | Description |
|--------|-------------|
| \`--session <name>\` | Isolated browser session (\`AGENT_BROWSER_SESSION\` env) |
| \`--session-name <name>\` | Auto-save/restore session state (\`AGENT_BROWSER_SESSION_NAME\` env) |
| \`--profile <path>\` | Persistent browser profile (\`AGENT_BROWSER_PROFILE\` env) |
| \`--state <path>\` | Load storage state from JSON file (\`AGENT_BROWSER_STATE\` env) |
| \`--headers <json>\` | HTTP headers scoped to URL's origin |
| \`--executable-path <path>\` | Custom browser binary (\`AGENT_BROWSER_EXECUTABLE_PATH\` env) |
| \`--extension <path>\` | Load browser extension (repeatable; \`AGENT_BROWSER_EXTENSIONS\` env) |
| \`--args <args>\` | Browser launch args (\`AGENT_BROWSER_ARGS\` env) |
| \`--user-agent <ua>\` | Custom User-Agent (\`AGENT_BROWSER_USER_AGENT\` env) |
| \`--proxy <url>\` | Proxy server (\`AGENT_BROWSER_PROXY\` env) |
| \`--proxy-bypass <hosts>\` | Hosts to bypass proxy (\`AGENT_BROWSER_PROXY_BYPASS\` env) |
| \`--ignore-https-errors\` | Ignore HTTPS certificate errors |
| \`--allow-file-access\` | Allow file:// URLs to access local files |
| \`-p, --provider <name>\` | Cloud browser provider (\`AGENT_BROWSER_PROVIDER\` env) |
| \`--device <name>\` | iOS device name (\`AGENT_BROWSER_IOS_DEVICE\` env) |
| \`--json\` | Machine-readable JSON output |
| \`--headed\` | Show browser window (not headless) |
| \`--full, -f\` | Full page screenshot |
| \`--annotate\` | Annotated screenshot with numbered labels (\`AGENT_BROWSER_ANNOTATE\` env) |
| \`--headed\` | Show browser window (\`AGENT_BROWSER_HEADED\` env) |
| \`--cdp <port\\|wss://url>\` | Connect via Chrome DevTools Protocol |
| \`--auto-connect\` | Auto-discover running Chrome (\`AGENT_BROWSER_AUTO_CONNECT\` env) |
| \`--color-scheme <scheme>\` | Color scheme: dark, light, no-preference (\`AGENT_BROWSER_COLOR_SCHEME\` env) |
| \`--download-path <path>\` | Default download directory (\`AGENT_BROWSER_DOWNLOAD_PATH\` env) |
| \`--native\` | [Experimental] Use native Rust daemon (\`AGENT_BROWSER_NATIVE\` env) |
| \`--config <path>\` | Custom config file (\`AGENT_BROWSER_CONFIG\` env) |
| \`--debug\` | Debug output |
### Security options
| Option | Description |
|--------|-------------|
| \`--content-boundaries\` | Wrap page output in boundary markers (\`AGENT_BROWSER_CONTENT_BOUNDARIES\` env) |
| \`--max-output <chars>\` | Truncate page output to N characters (\`AGENT_BROWSER_MAX_OUTPUT\` env) |
| \`--allowed-domains <list>\` | Comma-separated allowed domain patterns (\`AGENT_BROWSER_ALLOWED_DOMAINS\` env) |
| \`--action-policy <path>\` | Path to action policy JSON file (\`AGENT_BROWSER_ACTION_POLICY\` env) |
| \`--confirm-actions <list>\` | Action categories requiring confirmation (\`AGENT_BROWSER_CONFIRM_ACTIONS\` env) |
## Configuration file
Create \`agent-browser.json\` for persistent defaults (no need to repeat flags):
**Locations (lowest to highest priority):**
1. \`~/.agent-browser/config.json\` — user-level defaults
2. \`./agent-browser.json\` — project-level overrides
3. \`AGENT_BROWSER_*\` environment variables
4. CLI flags override everything
\`\`\`json
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"native": true
}
\`\`\`
## Example: Form submission
\`\`\`bash
@@ -261,6 +372,13 @@ agent-browser open other-site.com
agent-browser set headers '{"X-Custom-Header": "value"}'
\`\`\`
### Authentication Vault
\`\`\`bash
# Store credentials locally (encrypted). The LLM never sees passwords.
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
agent-browser auth login github
\`\`\`
## Sessions & Persistent Profiles
### Sessions (parallel browsers)
@@ -270,6 +388,13 @@ agent-browser --session test2 open site-b.com
agent-browser session list
\`\`\`
### Session persistence (auto-save/restore)
\`\`\`bash
agent-browser --session-name twitter open twitter.com
# Login once, state persists automatically across restarts
# State files stored in ~/.agent-browser/sessions/
\`\`\`
### Persistent Profiles
Persists cookies, localStorage, IndexedDB, service workers, cache, login sessions across browser restarts.
\`\`\`bash
@@ -277,9 +402,6 @@ agent-browser --profile ~/.myapp-profile open myapp.com
# Or via env var
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
\`\`\`
- Use different profile paths for different projects
- Login once → restart browser → still logged in
- Stores: cookies, localStorage, IndexedDB, service workers, browser cache
## JSON output (for parsing)
@@ -289,21 +411,53 @@ agent-browser snapshot -i --json
agent-browser get text @e1 --json
\`\`\`
## Debugging
## Local files
\`\`\`bash
agent-browser open example.com --headed # Show browser window
agent-browser console # View console messages
agent-browser errors # View page errors
agent-browser record start ./debug.webm # Record from current page
agent-browser record stop # Save recording
agent-browser connect 9222 # Local CDP port
agent-browser --allow-file-access open file:///path/to/document.pdf
agent-browser --allow-file-access open file:///path/to/page.html
\`\`\`
## CDP Mode
\`\`\`bash
agent-browser connect 9222 # Local CDP port
agent-browser --cdp 9222 snapshot # Direct CDP on each command
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot # Remote via WebSocket
agent-browser console --clear # Clear console
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser --auto-connect snapshot # Auto-discover running Chrome
\`\`\`
## Cloud providers
\`\`\`bash
# Browserbase
BROWSERBASE_API_KEY="key" BROWSERBASE_PROJECT_ID="id" agent-browser -p browserbase open example.com
# Browser Use
BROWSER_USE_API_KEY="key" agent-browser -p browseruse open example.com
# Kernel
KERNEL_API_KEY="key" agent-browser -p kernel open example.com
\`\`\`
## iOS Simulator
\`\`\`bash
agent-browser device list # List available simulators
agent-browser -p ios --device "iPhone 16 Pro" open example.com # Launch Safari
agent-browser -p ios snapshot -i # Same commands as desktop
agent-browser -p ios tap @e1 # Tap
agent-browser -p ios swipe up # Mobile-specific
agent-browser -p ios close # Close session
\`\`\`
## Native Mode (Experimental)
Pure Rust daemon using direct CDP — no Node.js/Playwright required:
\`\`\`bash
agent-browser --native open example.com
# Or: export AGENT_BROWSER_NATIVE=1
# Or: {"native": true} in agent-browser.json
\`\`\`
---