update agent-browser skill to match upstream v0.16.3
Sync SKILL.md and inline template with vercel-labs/agent-browser v0.16.3. Adds: native Rust daemon, diff commands, annotated screenshots, profiler, keyboard type/inserttext, get styles, expanded locators (placeholder/alt/ title/testid/last), security options, config file support, iOS Simulator, cloud providers (Browserbase/Browser Use/Kernel), session persistence, CDP auto-connect, and state management commands.
This commit is contained in:
@@ -26,29 +26,35 @@ agent-browser close # Close browser
|
||||
|
||||
### Navigation
|
||||
```bash
|
||||
agent-browser open <url> # Navigate to URL
|
||||
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
|
||||
agent-browser back # Go back
|
||||
agent-browser forward # Go forward
|
||||
agent-browser reload # Reload page
|
||||
agent-browser close # Close browser
|
||||
agent-browser close # Close browser (aliases: quit, exit)
|
||||
```
|
||||
|
||||
### Snapshot (page analysis)
|
||||
```bash
|
||||
agent-browser snapshot # Full accessibility tree
|
||||
agent-browser snapshot -i # Interactive elements only (recommended)
|
||||
agent-browser snapshot -c # Compact output
|
||||
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
|
||||
agent-browser snapshot -c # Compact (remove empty structural elements)
|
||||
agent-browser snapshot -d 3 # Limit depth to 3
|
||||
agent-browser snapshot -s "#main" # Scope to CSS selector
|
||||
agent-browser snapshot -i -c -d 5 # Combine options
|
||||
```
|
||||
|
||||
The `-C` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
|
||||
|
||||
### Interactions (use @refs from snapshot)
|
||||
```bash
|
||||
agent-browser click @e1 # Click
|
||||
agent-browser click @e1 # Click (--new-tab to open in new tab)
|
||||
agent-browser dblclick @e1 # Double-click
|
||||
agent-browser focus @e1 # Focus element
|
||||
agent-browser fill @e2 "text" # Clear and type
|
||||
agent-browser type @e2 "text" # Type without clearing
|
||||
agent-browser keyboard type "text" # Type with real keystrokes (no selector, current focus)
|
||||
agent-browser keyboard inserttext "text" # Insert text without key events (no selector)
|
||||
agent-browser press Enter # Press key
|
||||
agent-browser press Control+a # Key combination
|
||||
agent-browser keydown Shift # Hold key down
|
||||
@@ -57,8 +63,8 @@ agent-browser hover @e1 # Hover
|
||||
agent-browser check @e1 # Check checkbox
|
||||
agent-browser uncheck @e1 # Uncheck checkbox
|
||||
agent-browser select @e1 "value" # Select dropdown
|
||||
agent-browser scroll down 500 # Scroll page
|
||||
agent-browser scrollintoview @e1 # Scroll element into view
|
||||
agent-browser scroll down 500 # Scroll page (--selector <sel> for container)
|
||||
agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
|
||||
agent-browser drag @e1 @e2 # Drag and drop
|
||||
agent-browser upload @e1 file.pdf # Upload files
|
||||
```
|
||||
@@ -73,6 +79,7 @@ agent-browser get title # Get page title
|
||||
agent-browser get url # Get current URL
|
||||
agent-browser get count ".item" # Count matching elements
|
||||
agent-browser get box @e1 # Get bounding box
|
||||
agent-browser get styles @e1 # Get computed styles
|
||||
```
|
||||
|
||||
### Check state
|
||||
@@ -84,12 +91,20 @@ agent-browser is checked @e1 # Check if checked
|
||||
|
||||
### Screenshots & PDF
|
||||
```bash
|
||||
agent-browser screenshot # Screenshot to stdout
|
||||
agent-browser screenshot # Screenshot (saves to temp dir if no path)
|
||||
agent-browser screenshot path.png # Save to file
|
||||
agent-browser screenshot --full # Full page
|
||||
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
|
||||
agent-browser pdf output.pdf # Save as PDF
|
||||
```
|
||||
|
||||
Annotated screenshots overlay numbered labels `[N]` on interactive elements. Each label corresponds to ref `@eN`, so refs work for both visual and text workflows:
|
||||
```bash
|
||||
agent-browser screenshot --annotate ./page.png
|
||||
# Output: [1] @e1 button "Submit", [2] @e2 link "Home", [3] @e3 textbox "Email"
|
||||
agent-browser click @e2 # Click the "Home" link labeled [2]
|
||||
```
|
||||
|
||||
### Video recording
|
||||
```bash
|
||||
agent-browser record start ./demo.webm # Start recording (uses current URL + state)
|
||||
@@ -109,10 +124,12 @@ agent-browser wait --load networkidle # Wait for network idle
|
||||
agent-browser wait --fn "window.ready" # Wait for JS condition
|
||||
```
|
||||
|
||||
Load states: `load`, `domcontentloaded`, `networkidle`
|
||||
|
||||
### Mouse control
|
||||
```bash
|
||||
agent-browser mouse move 100 200 # Move mouse
|
||||
agent-browser mouse down left # Press button
|
||||
agent-browser mouse down left # Press button (left/right/middle)
|
||||
agent-browser mouse up left # Release button
|
||||
agent-browser mouse wheel 100 # Scroll wheel
|
||||
```
|
||||
@@ -122,10 +139,18 @@ agent-browser mouse wheel 100 # Scroll wheel
|
||||
agent-browser find role button click --name "Submit"
|
||||
agent-browser find text "Sign In" click
|
||||
agent-browser find label "Email" fill "user@test.com"
|
||||
agent-browser find placeholder "Search..." fill "query"
|
||||
agent-browser find alt "Logo" click
|
||||
agent-browser find title "Close" click
|
||||
agent-browser find testid "submit-btn" click
|
||||
agent-browser find first ".item" click
|
||||
agent-browser find last ".item" click
|
||||
agent-browser find nth 2 "a" text
|
||||
```
|
||||
|
||||
Actions: `click`, `fill`, `type`, `hover`, `focus`, `check`, `uncheck`, `text`
|
||||
Options: `--name <name>` (filter role by accessible name), `--exact` (require exact text match)
|
||||
|
||||
### Browser settings
|
||||
```bash
|
||||
agent-browser set viewport 1920 1080 # Set viewport size
|
||||
@@ -142,14 +167,13 @@ agent-browser set media dark # Emulate color scheme
|
||||
agent-browser cookies # Get all cookies
|
||||
agent-browser cookies set name value # Set cookie
|
||||
agent-browser cookies clear # Clear cookies
|
||||
|
||||
agent-browser storage local # Get all localStorage
|
||||
agent-browser storage local key # Get specific key
|
||||
agent-browser storage local set k v # Set value
|
||||
agent-browser storage local clear # Clear all
|
||||
agent-browser storage session # Get all sessionStorage
|
||||
agent-browser storage session key # Get specific key
|
||||
agent-browser storage session set k v # Set value
|
||||
agent-browser storage session clear # Clear all
|
||||
|
||||
agent-browser storage session # Same for sessionStorage
|
||||
```
|
||||
|
||||
### Network
|
||||
@@ -179,13 +203,59 @@ agent-browser frame main # Back to main frame
|
||||
|
||||
### Dialogs
|
||||
```bash
|
||||
agent-browser dialog accept [text] # Accept dialog
|
||||
agent-browser dialog accept [text] # Accept dialog (with optional prompt text)
|
||||
agent-browser dialog dismiss # Dismiss dialog
|
||||
```
|
||||
|
||||
### Diff (compare snapshots, screenshots, URLs)
|
||||
```bash
|
||||
agent-browser diff snapshot # Compare current vs last snapshot
|
||||
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
|
||||
agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
|
||||
agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
|
||||
agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
|
||||
agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
|
||||
agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
|
||||
agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
|
||||
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
|
||||
```
|
||||
|
||||
### JavaScript
|
||||
```bash
|
||||
agent-browser eval "document.title" # Run JavaScript
|
||||
agent-browser eval -b "base64code" # Run base64-encoded JS
|
||||
agent-browser eval --stdin # Read JS from stdin
|
||||
```
|
||||
|
||||
### Debug & Profiling
|
||||
```bash
|
||||
agent-browser console # View console messages
|
||||
agent-browser console --clear # Clear console
|
||||
agent-browser errors # View page errors
|
||||
agent-browser errors --clear # Clear errors
|
||||
agent-browser highlight @e1 # Highlight element
|
||||
agent-browser trace start # Start recording trace
|
||||
agent-browser trace stop trace.zip # Stop and save trace
|
||||
agent-browser profiler start # Start Chrome DevTools profiling
|
||||
agent-browser profiler stop profile.json # Stop and save profile
|
||||
```
|
||||
|
||||
### State management
|
||||
```bash
|
||||
agent-browser state save auth.json # Save auth state
|
||||
agent-browser state load auth.json # Load auth state
|
||||
agent-browser state list # List saved state files
|
||||
agent-browser state show <file> # Show state summary
|
||||
agent-browser state rename <old> <new> # Rename state file
|
||||
agent-browser state clear [name] # Clear states for session
|
||||
agent-browser state clear --all # Clear all saved states
|
||||
agent-browser state clean --older-than <days> # Delete old states
|
||||
```
|
||||
|
||||
### Setup
|
||||
```bash
|
||||
agent-browser install # Download Chromium browser
|
||||
agent-browser install --with-deps # Also install system deps (Linux)
|
||||
```
|
||||
|
||||
## Global Options
|
||||
@@ -193,19 +263,60 @@ agent-browser eval "document.title" # Run JavaScript
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--session <name>` | Isolated browser session (`AGENT_BROWSER_SESSION` env) |
|
||||
| `--session-name <name>` | Auto-save/restore session state (`AGENT_BROWSER_SESSION_NAME` env) |
|
||||
| `--profile <path>` | Persistent browser profile (`AGENT_BROWSER_PROFILE` env) |
|
||||
| `--state <path>` | Load storage state from JSON file (`AGENT_BROWSER_STATE` env) |
|
||||
| `--headers <json>` | HTTP headers scoped to URL's origin |
|
||||
| `--executable-path <path>` | Custom browser binary (`AGENT_BROWSER_EXECUTABLE_PATH` env) |
|
||||
| `--extension <path>` | Load browser extension (repeatable; `AGENT_BROWSER_EXTENSIONS` env) |
|
||||
| `--args <args>` | Browser launch args (`AGENT_BROWSER_ARGS` env) |
|
||||
| `--user-agent <ua>` | Custom User-Agent (`AGENT_BROWSER_USER_AGENT` env) |
|
||||
| `--proxy <url>` | Proxy server (`AGENT_BROWSER_PROXY` env) |
|
||||
| `--proxy-bypass <hosts>` | Hosts to bypass proxy (`AGENT_BROWSER_PROXY_BYPASS` env) |
|
||||
| `--ignore-https-errors` | Ignore HTTPS certificate errors |
|
||||
| `--allow-file-access` | Allow file:// URLs to access local files |
|
||||
| `-p, --provider <name>` | Cloud browser provider (`AGENT_BROWSER_PROVIDER` env) |
|
||||
| `--device <name>` | iOS device name (`AGENT_BROWSER_IOS_DEVICE` env) |
|
||||
| `--json` | Machine-readable JSON output |
|
||||
| `--headed` | Show browser window (not headless) |
|
||||
| `--full, -f` | Full page screenshot |
|
||||
| `--annotate` | Annotated screenshot with numbered labels (`AGENT_BROWSER_ANNOTATE` env) |
|
||||
| `--headed` | Show browser window (`AGENT_BROWSER_HEADED` env) |
|
||||
| `--cdp <port\|wss://url>` | Connect via Chrome DevTools Protocol |
|
||||
| `--auto-connect` | Auto-discover running Chrome (`AGENT_BROWSER_AUTO_CONNECT` env) |
|
||||
| `--color-scheme <scheme>` | Color scheme: dark, light, no-preference (`AGENT_BROWSER_COLOR_SCHEME` env) |
|
||||
| `--download-path <path>` | Default download directory (`AGENT_BROWSER_DOWNLOAD_PATH` env) |
|
||||
| `--native` | [Experimental] Use native Rust daemon (`AGENT_BROWSER_NATIVE` env) |
|
||||
| `--config <path>` | Custom config file (`AGENT_BROWSER_CONFIG` env) |
|
||||
| `--debug` | Debug output |
|
||||
|
||||
### Security options
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--content-boundaries` | Wrap page output in boundary markers (`AGENT_BROWSER_CONTENT_BOUNDARIES` env) |
|
||||
| `--max-output <chars>` | Truncate page output to N characters (`AGENT_BROWSER_MAX_OUTPUT` env) |
|
||||
| `--allowed-domains <list>` | Comma-separated allowed domain patterns (`AGENT_BROWSER_ALLOWED_DOMAINS` env) |
|
||||
| `--action-policy <path>` | Path to action policy JSON file (`AGENT_BROWSER_ACTION_POLICY` env) |
|
||||
| `--confirm-actions <list>` | Action categories requiring confirmation (`AGENT_BROWSER_CONFIRM_ACTIONS` env) |
|
||||
|
||||
## Configuration file
|
||||
|
||||
Create `agent-browser.json` for persistent defaults (no need to repeat flags):
|
||||
|
||||
**Locations (lowest to highest priority):**
|
||||
1. `~/.agent-browser/config.json` — user-level defaults
|
||||
2. `./agent-browser.json` — project-level overrides
|
||||
3. `AGENT_BROWSER_*` environment variables
|
||||
4. CLI flags override everything
|
||||
|
||||
```json
|
||||
{
|
||||
"headed": true,
|
||||
"proxy": "http://localhost:8080",
|
||||
"profile": "./browser-data",
|
||||
"native": true
|
||||
}
|
||||
```
|
||||
|
||||
## Example: Form submission
|
||||
|
||||
```bash
|
||||
@@ -247,6 +358,13 @@ agent-browser open other-site.com
|
||||
agent-browser set headers '{"X-Custom-Header": "value"}'
|
||||
```
|
||||
|
||||
### Authentication Vault
|
||||
```bash
|
||||
# Store credentials locally (encrypted). The LLM never sees passwords.
|
||||
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
|
||||
agent-browser auth login github
|
||||
```
|
||||
|
||||
## Sessions & Persistent Profiles
|
||||
|
||||
### Sessions (parallel browsers)
|
||||
@@ -256,6 +374,13 @@ agent-browser --session test2 open site-b.com
|
||||
agent-browser session list
|
||||
```
|
||||
|
||||
### Session persistence (auto-save/restore)
|
||||
```bash
|
||||
agent-browser --session-name twitter open twitter.com
|
||||
# Login once, state persists automatically across restarts
|
||||
# State files stored in ~/.agent-browser/sessions/
|
||||
```
|
||||
|
||||
### Persistent Profiles
|
||||
Persists cookies, localStorage, IndexedDB, service workers, cache, login sessions across browser restarts.
|
||||
```bash
|
||||
@@ -263,9 +388,6 @@ agent-browser --profile ~/.myapp-profile open myapp.com
|
||||
# Or via env var
|
||||
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
|
||||
```
|
||||
- Use different profile paths for different projects
|
||||
- Login once → restart browser → still logged in
|
||||
- Stores: cookies, localStorage, IndexedDB, service workers, browser cache
|
||||
|
||||
## JSON output (for parsing)
|
||||
|
||||
@@ -275,62 +397,54 @@ agent-browser snapshot -i --json
|
||||
agent-browser get text @e1 --json
|
||||
```
|
||||
|
||||
## Debugging
|
||||
## Local files
|
||||
|
||||
```bash
|
||||
agent-browser open example.com --headed # Show browser window
|
||||
agent-browser console # View console messages
|
||||
agent-browser errors # View page errors
|
||||
agent-browser record start ./debug.webm # Record from current page
|
||||
agent-browser record stop # Save recording
|
||||
agent-browser connect 9222 # Local CDP port
|
||||
agent-browser --allow-file-access open file:///path/to/document.pdf
|
||||
agent-browser --allow-file-access open file:///path/to/page.html
|
||||
```
|
||||
|
||||
## CDP Mode
|
||||
|
||||
```bash
|
||||
agent-browser connect 9222 # Local CDP port
|
||||
agent-browser --cdp 9222 snapshot # Direct CDP on each command
|
||||
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot # Remote via WebSocket
|
||||
agent-browser console --clear # Clear console
|
||||
agent-browser errors --clear # Clear errors
|
||||
agent-browser highlight @e1 # Highlight element
|
||||
agent-browser trace start # Start recording trace
|
||||
agent-browser trace stop trace.zip # Stop and save trace
|
||||
agent-browser --auto-connect snapshot # Auto-discover running Chrome
|
||||
```
|
||||
|
||||
## Cloud providers
|
||||
|
||||
```bash
|
||||
# Browserbase
|
||||
BROWSERBASE_API_KEY="key" BROWSERBASE_PROJECT_ID="id" agent-browser -p browserbase open example.com
|
||||
|
||||
# Browser Use
|
||||
BROWSER_USE_API_KEY="key" agent-browser -p browseruse open example.com
|
||||
|
||||
# Kernel
|
||||
KERNEL_API_KEY="key" agent-browser -p kernel open example.com
|
||||
```
|
||||
|
||||
## iOS Simulator
|
||||
|
||||
```bash
|
||||
agent-browser device list # List available simulators
|
||||
agent-browser -p ios --device "iPhone 16 Pro" open example.com # Launch Safari
|
||||
agent-browser -p ios snapshot -i # Same commands as desktop
|
||||
agent-browser -p ios tap @e1 # Tap
|
||||
agent-browser -p ios swipe up # Mobile-specific
|
||||
agent-browser -p ios close # Close session
|
||||
```
|
||||
|
||||
## Native Mode (Experimental)
|
||||
|
||||
Pure Rust daemon using direct CDP — no Node.js/Playwright required:
|
||||
```bash
|
||||
agent-browser --native open example.com
|
||||
# Or: export AGENT_BROWSER_NATIVE=1
|
||||
# Or: {"native": true} in agent-browser.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Step 1: Install agent-browser CLI
|
||||
|
||||
```bash
|
||||
bun add -g agent-browser
|
||||
```
|
||||
|
||||
### Step 2: Install Playwright browsers
|
||||
|
||||
**IMPORTANT**: `agent-browser install` may fail on some platforms (e.g., darwin-arm64) with "No binary found" error. In that case, install Playwright browsers directly:
|
||||
|
||||
```bash
|
||||
# Create a temp project and install playwright
|
||||
cd /tmp && bun init -y && bun add playwright
|
||||
|
||||
# Install Chromium browser
|
||||
bun playwright install chromium
|
||||
```
|
||||
|
||||
This downloads Chrome for Testing to `~/Library/Caches/ms-playwright/`.
|
||||
|
||||
### Verify installation
|
||||
|
||||
```bash
|
||||
agent-browser open https://example.com --headed
|
||||
```
|
||||
|
||||
If the browser opens successfully, installation is complete.
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
| Error | Solution |
|
||||
|-------|----------|
|
||||
| `No binary found for darwin-arm64` | Run `bun playwright install chromium` in a project with playwright dependency |
|
||||
| `Executable doesn't exist at .../chromium-XXXX` | Re-run `bun playwright install chromium` |
|
||||
| Browser doesn't open | Ensure `--headed` flag is used for visible browser |
|
||||
|
||||
---
|
||||
Run `agent-browser --help` for all commands. Repo: https://github.com/vercel-labs/agent-browser
|
||||
Install: `bun add -g agent-browser && agent-browser install`. Run `agent-browser --help` for all commands. Repo: https://github.com/vercel-labs/agent-browser
|
||||
|
||||
@@ -40,29 +40,35 @@ agent-browser close # Close browser
|
||||
|
||||
### Navigation
|
||||
\`\`\`bash
|
||||
agent-browser open <url> # Navigate to URL
|
||||
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
|
||||
agent-browser back # Go back
|
||||
agent-browser forward # Go forward
|
||||
agent-browser reload # Reload page
|
||||
agent-browser close # Close browser
|
||||
agent-browser close # Close browser (aliases: quit, exit)
|
||||
\`\`\`
|
||||
|
||||
### Snapshot (page analysis)
|
||||
\`\`\`bash
|
||||
agent-browser snapshot # Full accessibility tree
|
||||
agent-browser snapshot -i # Interactive elements only (recommended)
|
||||
agent-browser snapshot -c # Compact output
|
||||
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, etc.)
|
||||
agent-browser snapshot -c # Compact (remove empty structural elements)
|
||||
agent-browser snapshot -d 3 # Limit depth to 3
|
||||
agent-browser snapshot -s "#main" # Scope to CSS selector
|
||||
agent-browser snapshot -i -c -d 5 # Combine options
|
||||
\`\`\`
|
||||
|
||||
The \`-C\` flag is useful for modern web apps that use custom clickable elements (divs, spans) instead of standard buttons/links.
|
||||
|
||||
### Interactions (use @refs from snapshot)
|
||||
\`\`\`bash
|
||||
agent-browser click @e1 # Click
|
||||
agent-browser click @e1 # Click (--new-tab to open in new tab)
|
||||
agent-browser dblclick @e1 # Double-click
|
||||
agent-browser focus @e1 # Focus element
|
||||
agent-browser fill @e2 "text" # Clear and type
|
||||
agent-browser type @e2 "text" # Type without clearing
|
||||
agent-browser keyboard type "text" # Type with real keystrokes (no selector, current focus)
|
||||
agent-browser keyboard inserttext "text" # Insert text without key events (no selector)
|
||||
agent-browser press Enter # Press key
|
||||
agent-browser press Control+a # Key combination
|
||||
agent-browser keydown Shift # Hold key down
|
||||
@@ -71,8 +77,8 @@ agent-browser hover @e1 # Hover
|
||||
agent-browser check @e1 # Check checkbox
|
||||
agent-browser uncheck @e1 # Uncheck checkbox
|
||||
agent-browser select @e1 "value" # Select dropdown
|
||||
agent-browser scroll down 500 # Scroll page
|
||||
agent-browser scrollintoview @e1 # Scroll element into view
|
||||
agent-browser scroll down 500 # Scroll page (--selector <sel> for container)
|
||||
agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
|
||||
agent-browser drag @e1 @e2 # Drag and drop
|
||||
agent-browser upload @e1 file.pdf # Upload files
|
||||
\`\`\`
|
||||
@@ -87,6 +93,7 @@ agent-browser get title # Get page title
|
||||
agent-browser get url # Get current URL
|
||||
agent-browser get count ".item" # Count matching elements
|
||||
agent-browser get box @e1 # Get bounding box
|
||||
agent-browser get styles @e1 # Get computed styles
|
||||
\`\`\`
|
||||
|
||||
### Check state
|
||||
@@ -98,12 +105,20 @@ agent-browser is checked @e1 # Check if checked
|
||||
|
||||
### Screenshots & PDF
|
||||
\`\`\`bash
|
||||
agent-browser screenshot # Screenshot to stdout
|
||||
agent-browser screenshot # Screenshot (saves to temp dir if no path)
|
||||
agent-browser screenshot path.png # Save to file
|
||||
agent-browser screenshot --full # Full page
|
||||
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
|
||||
agent-browser pdf output.pdf # Save as PDF
|
||||
\`\`\`
|
||||
|
||||
Annotated screenshots overlay numbered labels \`[N]\` on interactive elements. Each label corresponds to ref \`@eN\`, so refs work for both visual and text workflows:
|
||||
\`\`\`bash
|
||||
agent-browser screenshot --annotate ./page.png
|
||||
# Output: [1] @e1 button "Submit", [2] @e2 link "Home", [3] @e3 textbox "Email"
|
||||
agent-browser click @e2 # Click the "Home" link labeled [2]
|
||||
\`\`\`
|
||||
|
||||
### Video recording
|
||||
\`\`\`bash
|
||||
agent-browser record start ./demo.webm # Start recording (uses current URL + state)
|
||||
@@ -123,10 +138,12 @@ agent-browser wait --load networkidle # Wait for network idle
|
||||
agent-browser wait --fn "window.ready" # Wait for JS condition
|
||||
\`\`\`
|
||||
|
||||
Load states: \`load\`, \`domcontentloaded\`, \`networkidle\`
|
||||
|
||||
### Mouse control
|
||||
\`\`\`bash
|
||||
agent-browser mouse move 100 200 # Move mouse
|
||||
agent-browser mouse down left # Press button
|
||||
agent-browser mouse down left # Press button (left/right/middle)
|
||||
agent-browser mouse up left # Release button
|
||||
agent-browser mouse wheel 100 # Scroll wheel
|
||||
\`\`\`
|
||||
@@ -136,10 +153,18 @@ agent-browser mouse wheel 100 # Scroll wheel
|
||||
agent-browser find role button click --name "Submit"
|
||||
agent-browser find text "Sign In" click
|
||||
agent-browser find label "Email" fill "user@test.com"
|
||||
agent-browser find placeholder "Search..." fill "query"
|
||||
agent-browser find alt "Logo" click
|
||||
agent-browser find title "Close" click
|
||||
agent-browser find testid "submit-btn" click
|
||||
agent-browser find first ".item" click
|
||||
agent-browser find last ".item" click
|
||||
agent-browser find nth 2 "a" text
|
||||
\`\`\`
|
||||
|
||||
Actions: \`click\`, \`fill\`, \`type\`, \`hover\`, \`focus\`, \`check\`, \`uncheck\`, \`text\`
|
||||
Options: \`--name <name>\` (filter role by accessible name), \`--exact\` (require exact text match)
|
||||
|
||||
### Browser settings
|
||||
\`\`\`bash
|
||||
agent-browser set viewport 1920 1080 # Set viewport size
|
||||
@@ -156,14 +181,13 @@ agent-browser set media dark # Emulate color scheme
|
||||
agent-browser cookies # Get all cookies
|
||||
agent-browser cookies set name value # Set cookie
|
||||
agent-browser cookies clear # Clear cookies
|
||||
|
||||
agent-browser storage local # Get all localStorage
|
||||
agent-browser storage local key # Get specific key
|
||||
agent-browser storage local set k v # Set value
|
||||
agent-browser storage local clear # Clear all
|
||||
agent-browser storage session # Get all sessionStorage
|
||||
agent-browser storage session key # Get specific key
|
||||
agent-browser storage session set k v # Set value
|
||||
agent-browser storage session clear # Clear all
|
||||
|
||||
agent-browser storage session # Same for sessionStorage
|
||||
\`\`\`
|
||||
|
||||
### Network
|
||||
@@ -193,13 +217,59 @@ agent-browser frame main # Back to main frame
|
||||
|
||||
### Dialogs
|
||||
\`\`\`bash
|
||||
agent-browser dialog accept [text] # Accept dialog
|
||||
agent-browser dialog accept [text] # Accept dialog (with optional prompt text)
|
||||
agent-browser dialog dismiss # Dismiss dialog
|
||||
\`\`\`
|
||||
|
||||
### Diff (compare snapshots, screenshots, URLs)
|
||||
\`\`\`bash
|
||||
agent-browser diff snapshot # Compare current vs last snapshot
|
||||
agent-browser diff snapshot --baseline before.txt # Compare current vs saved snapshot file
|
||||
agent-browser diff snapshot --selector "#main" --compact # Scoped snapshot diff
|
||||
agent-browser diff screenshot --baseline before.png # Visual pixel diff against baseline
|
||||
agent-browser diff screenshot --baseline b.png -o d.png # Save diff image to custom path
|
||||
agent-browser diff screenshot --baseline b.png -t 0.2 # Adjust color threshold (0-1)
|
||||
agent-browser diff url https://v1.com https://v2.com # Compare two URLs (snapshot diff)
|
||||
agent-browser diff url https://v1.com https://v2.com --screenshot # Also visual diff
|
||||
agent-browser diff url https://v1.com https://v2.com --selector "#main" # Scope to element
|
||||
\`\`\`
|
||||
|
||||
### JavaScript
|
||||
\`\`\`bash
|
||||
agent-browser eval "document.title" # Run JavaScript
|
||||
agent-browser eval -b "base64code" # Run base64-encoded JS
|
||||
agent-browser eval --stdin # Read JS from stdin
|
||||
\`\`\`
|
||||
|
||||
### Debug & Profiling
|
||||
\`\`\`bash
|
||||
agent-browser console # View console messages
|
||||
agent-browser console --clear # Clear console
|
||||
agent-browser errors # View page errors
|
||||
agent-browser errors --clear # Clear errors
|
||||
agent-browser highlight @e1 # Highlight element
|
||||
agent-browser trace start # Start recording trace
|
||||
agent-browser trace stop trace.zip # Stop and save trace
|
||||
agent-browser profiler start # Start Chrome DevTools profiling
|
||||
agent-browser profiler stop profile.json # Stop and save profile
|
||||
\`\`\`
|
||||
|
||||
### State management
|
||||
\`\`\`bash
|
||||
agent-browser state save auth.json # Save auth state
|
||||
agent-browser state load auth.json # Load auth state
|
||||
agent-browser state list # List saved state files
|
||||
agent-browser state show <file> # Show state summary
|
||||
agent-browser state rename <old> <new> # Rename state file
|
||||
agent-browser state clear [name] # Clear states for session
|
||||
agent-browser state clear --all # Clear all saved states
|
||||
agent-browser state clean --older-than <days> # Delete old states
|
||||
\`\`\`
|
||||
|
||||
### Setup
|
||||
\`\`\`bash
|
||||
agent-browser install # Download Chromium browser
|
||||
agent-browser install --with-deps # Also install system deps (Linux)
|
||||
\`\`\`
|
||||
|
||||
## Global Options
|
||||
@@ -207,19 +277,60 @@ agent-browser eval "document.title" # Run JavaScript
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| \`--session <name>\` | Isolated browser session (\`AGENT_BROWSER_SESSION\` env) |
|
||||
| \`--session-name <name>\` | Auto-save/restore session state (\`AGENT_BROWSER_SESSION_NAME\` env) |
|
||||
| \`--profile <path>\` | Persistent browser profile (\`AGENT_BROWSER_PROFILE\` env) |
|
||||
| \`--state <path>\` | Load storage state from JSON file (\`AGENT_BROWSER_STATE\` env) |
|
||||
| \`--headers <json>\` | HTTP headers scoped to URL's origin |
|
||||
| \`--executable-path <path>\` | Custom browser binary (\`AGENT_BROWSER_EXECUTABLE_PATH\` env) |
|
||||
| \`--extension <path>\` | Load browser extension (repeatable; \`AGENT_BROWSER_EXTENSIONS\` env) |
|
||||
| \`--args <args>\` | Browser launch args (\`AGENT_BROWSER_ARGS\` env) |
|
||||
| \`--user-agent <ua>\` | Custom User-Agent (\`AGENT_BROWSER_USER_AGENT\` env) |
|
||||
| \`--proxy <url>\` | Proxy server (\`AGENT_BROWSER_PROXY\` env) |
|
||||
| \`--proxy-bypass <hosts>\` | Hosts to bypass proxy (\`AGENT_BROWSER_PROXY_BYPASS\` env) |
|
||||
| \`--ignore-https-errors\` | Ignore HTTPS certificate errors |
|
||||
| \`--allow-file-access\` | Allow file:// URLs to access local files |
|
||||
| \`-p, --provider <name>\` | Cloud browser provider (\`AGENT_BROWSER_PROVIDER\` env) |
|
||||
| \`--device <name>\` | iOS device name (\`AGENT_BROWSER_IOS_DEVICE\` env) |
|
||||
| \`--json\` | Machine-readable JSON output |
|
||||
| \`--headed\` | Show browser window (not headless) |
|
||||
| \`--full, -f\` | Full page screenshot |
|
||||
| \`--annotate\` | Annotated screenshot with numbered labels (\`AGENT_BROWSER_ANNOTATE\` env) |
|
||||
| \`--headed\` | Show browser window (\`AGENT_BROWSER_HEADED\` env) |
|
||||
| \`--cdp <port\\|wss://url>\` | Connect via Chrome DevTools Protocol |
|
||||
| \`--auto-connect\` | Auto-discover running Chrome (\`AGENT_BROWSER_AUTO_CONNECT\` env) |
|
||||
| \`--color-scheme <scheme>\` | Color scheme: dark, light, no-preference (\`AGENT_BROWSER_COLOR_SCHEME\` env) |
|
||||
| \`--download-path <path>\` | Default download directory (\`AGENT_BROWSER_DOWNLOAD_PATH\` env) |
|
||||
| \`--native\` | [Experimental] Use native Rust daemon (\`AGENT_BROWSER_NATIVE\` env) |
|
||||
| \`--config <path>\` | Custom config file (\`AGENT_BROWSER_CONFIG\` env) |
|
||||
| \`--debug\` | Debug output |
|
||||
|
||||
### Security options
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| \`--content-boundaries\` | Wrap page output in boundary markers (\`AGENT_BROWSER_CONTENT_BOUNDARIES\` env) |
|
||||
| \`--max-output <chars>\` | Truncate page output to N characters (\`AGENT_BROWSER_MAX_OUTPUT\` env) |
|
||||
| \`--allowed-domains <list>\` | Comma-separated allowed domain patterns (\`AGENT_BROWSER_ALLOWED_DOMAINS\` env) |
|
||||
| \`--action-policy <path>\` | Path to action policy JSON file (\`AGENT_BROWSER_ACTION_POLICY\` env) |
|
||||
| \`--confirm-actions <list>\` | Action categories requiring confirmation (\`AGENT_BROWSER_CONFIRM_ACTIONS\` env) |
|
||||
|
||||
## Configuration file
|
||||
|
||||
Create \`agent-browser.json\` for persistent defaults (no need to repeat flags):
|
||||
|
||||
**Locations (lowest to highest priority):**
|
||||
1. \`~/.agent-browser/config.json\` — user-level defaults
|
||||
2. \`./agent-browser.json\` — project-level overrides
|
||||
3. \`AGENT_BROWSER_*\` environment variables
|
||||
4. CLI flags override everything
|
||||
|
||||
\`\`\`json
|
||||
{
|
||||
"headed": true,
|
||||
"proxy": "http://localhost:8080",
|
||||
"profile": "./browser-data",
|
||||
"native": true
|
||||
}
|
||||
\`\`\`
|
||||
|
||||
## Example: Form submission
|
||||
|
||||
\`\`\`bash
|
||||
@@ -261,6 +372,13 @@ agent-browser open other-site.com
|
||||
agent-browser set headers '{"X-Custom-Header": "value"}'
|
||||
\`\`\`
|
||||
|
||||
### Authentication Vault
|
||||
\`\`\`bash
|
||||
# Store credentials locally (encrypted). The LLM never sees passwords.
|
||||
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
|
||||
agent-browser auth login github
|
||||
\`\`\`
|
||||
|
||||
## Sessions & Persistent Profiles
|
||||
|
||||
### Sessions (parallel browsers)
|
||||
@@ -270,6 +388,13 @@ agent-browser --session test2 open site-b.com
|
||||
agent-browser session list
|
||||
\`\`\`
|
||||
|
||||
### Session persistence (auto-save/restore)
|
||||
\`\`\`bash
|
||||
agent-browser --session-name twitter open twitter.com
|
||||
# Login once, state persists automatically across restarts
|
||||
# State files stored in ~/.agent-browser/sessions/
|
||||
\`\`\`
|
||||
|
||||
### Persistent Profiles
|
||||
Persists cookies, localStorage, IndexedDB, service workers, cache, login sessions across browser restarts.
|
||||
\`\`\`bash
|
||||
@@ -277,9 +402,6 @@ agent-browser --profile ~/.myapp-profile open myapp.com
|
||||
# Or via env var
|
||||
AGENT_BROWSER_PROFILE=~/.myapp-profile agent-browser open myapp.com
|
||||
\`\`\`
|
||||
- Use different profile paths for different projects
|
||||
- Login once → restart browser → still logged in
|
||||
- Stores: cookies, localStorage, IndexedDB, service workers, browser cache
|
||||
|
||||
## JSON output (for parsing)
|
||||
|
||||
@@ -289,21 +411,53 @@ agent-browser snapshot -i --json
|
||||
agent-browser get text @e1 --json
|
||||
\`\`\`
|
||||
|
||||
## Debugging
|
||||
## Local files
|
||||
|
||||
\`\`\`bash
|
||||
agent-browser open example.com --headed # Show browser window
|
||||
agent-browser console # View console messages
|
||||
agent-browser errors # View page errors
|
||||
agent-browser record start ./debug.webm # Record from current page
|
||||
agent-browser record stop # Save recording
|
||||
agent-browser connect 9222 # Local CDP port
|
||||
agent-browser --allow-file-access open file:///path/to/document.pdf
|
||||
agent-browser --allow-file-access open file:///path/to/page.html
|
||||
\`\`\`
|
||||
|
||||
## CDP Mode
|
||||
|
||||
\`\`\`bash
|
||||
agent-browser connect 9222 # Local CDP port
|
||||
agent-browser --cdp 9222 snapshot # Direct CDP on each command
|
||||
agent-browser --cdp "wss://browser-service.com/cdp?token=..." snapshot # Remote via WebSocket
|
||||
agent-browser console --clear # Clear console
|
||||
agent-browser errors --clear # Clear errors
|
||||
agent-browser highlight @e1 # Highlight element
|
||||
agent-browser trace start # Start recording trace
|
||||
agent-browser trace stop trace.zip # Stop and save trace
|
||||
agent-browser --auto-connect snapshot # Auto-discover running Chrome
|
||||
\`\`\`
|
||||
|
||||
## Cloud providers
|
||||
|
||||
\`\`\`bash
|
||||
# Browserbase
|
||||
BROWSERBASE_API_KEY="key" BROWSERBASE_PROJECT_ID="id" agent-browser -p browserbase open example.com
|
||||
|
||||
# Browser Use
|
||||
BROWSER_USE_API_KEY="key" agent-browser -p browseruse open example.com
|
||||
|
||||
# Kernel
|
||||
KERNEL_API_KEY="key" agent-browser -p kernel open example.com
|
||||
\`\`\`
|
||||
|
||||
## iOS Simulator
|
||||
|
||||
\`\`\`bash
|
||||
agent-browser device list # List available simulators
|
||||
agent-browser -p ios --device "iPhone 16 Pro" open example.com # Launch Safari
|
||||
agent-browser -p ios snapshot -i # Same commands as desktop
|
||||
agent-browser -p ios tap @e1 # Tap
|
||||
agent-browser -p ios swipe up # Mobile-specific
|
||||
agent-browser -p ios close # Close session
|
||||
\`\`\`
|
||||
|
||||
## Native Mode (Experimental)
|
||||
|
||||
Pure Rust daemon using direct CDP — no Node.js/Playwright required:
|
||||
\`\`\`bash
|
||||
agent-browser --native open example.com
|
||||
# Or: export AGENT_BROWSER_NATIVE=1
|
||||
# Or: {"native": true} in agent-browser.json
|
||||
\`\`\`
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user