# Browser Automation Integration Guide ## Overview ZCLAW now includes browser automation capabilities powered by **Fantoccini** (Rust WebDriver client). This enables the Browser Hand to automate web browsers for testing, scraping, and automation tasks. ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Frontend (React) │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ browser-client.ts │ │ │ │ - createSession() / closeSession() │ │ │ │ - navigate() / click() / type() │ │ │ │ - screenshot() / scrapePage() │ │ │ └─────────────────────┬───────────────────────────────┘ │ └────────────────────────┼────────────────────────────────────┘ │ Tauri invoke() ▼ ┌─────────────────────────────────────────────────────────────┐ │ Tauri Backend (Rust) │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ browser/commands.rs │ │ │ │ - Tauri command handlers │ │ │ └─────────────────────┬───────────────────────────────┘ │ │ │ │ │ ┌─────────────────────▼───────────────────────────────┐ │ │ │ browser/client.rs │ │ │ │ - BrowserClient (WebDriver connection) │ │ │ │ - Session management │ │ │ │ - Element operations │ │ │ └─────────────────────┬───────────────────────────────┘ │ │ │ │ │ ┌─────────────────────▼───────────────────────────────┐ │ │ │ Fantoccini (WebDriver Protocol) │ │ │ └─────────────────────┬───────────────────────────────┘ │ └────────────────────────┼────────────────────────────────────┘ │ WebDriver Protocol ▼ ┌─────────────────────────────────────────────────────────────┐ │ ChromeDriver / GeckoDriver │ │ (Requires separate installation) │ └─────────────────────────────────────────────────────────────┘ ``` ## Prerequisites ### 1. Install WebDriver You need a WebDriver installed and running: ```bash # Chrome (ChromeDriver) # Download from: https://chromedriver.chromium.org/ chromedriver --port=4444 # Firefox (geckodriver) # Download from: https://github.com/mozilla/geckodriver geckodriver --port=4444 ``` ### 2. Verify WebDriver is Running ```bash curl http://localhost:4444/status ``` ## Usage Examples ### Basic Usage (Functional API) ```typescript import { createSession, navigate, click, screenshot, closeSession } from './lib/browser-client'; async function example() { // Create session const { session_id } = await createSession({ headless: true, browserType: 'chrome', }); try { // Navigate await navigate(session_id, 'https://example.com'); // Click element await click(session_id, 'button.submit'); // Take screenshot const { base64 } = await screenshot(session_id); console.log('Screenshot taken, size:', base64.length); } finally { // Always close session await closeSession(session_id); } } ``` ### Using Browser Class (Recommended) ```typescript import Browser from './lib/browser-client'; async function scrapeData() { const browser = new Browser(); try { // Start browser await browser.start({ headless: true }); // Navigate await browser.goto('https://example.com/products'); // Wait for products to load await browser.wait('.product-list', 5000); // Scrape product data const data = await browser.scrape( ['.product-name', '.product-price', '.product-description'], '.product-list' ); console.log('Products:', data); } finally { await browser.close(); } } ``` ### Form Filling ```typescript import Browser from './lib/browser-client'; async function fillForm() { const browser = new Browser(); try { await browser.start(); await browser.goto('https://example.com/login'); // Fill login form await browser.fillForm([ { selector: 'input[name="email"]', value: 'user@example.com' }, { selector: 'input[name="password"]', value: 'password123' }, ], 'button[type="submit"]'); // Wait for redirect await browser.wait('.dashboard', 5000); // Take screenshot of logged-in state const { base64 } = await browser.screenshot(); } finally { await browser.close(); } } ``` ### Integration with Hands System ```typescript // In your Hand implementation import Browser from '../lib/browser-client'; export class BrowserHand implements Hand { name = 'browser'; description = 'Automates web browser interactions'; async execute(task: BrowserTask): Promise { const browser = new Browser(); try { await browser.start({ headless: true }); switch (task.action) { case 'scrape': await browser.goto(task.url); return { success: true, data: await browser.scrape(task.selectors) }; case 'screenshot': await browser.goto(task.url); return { success: true, data: await browser.screenshot() }; case 'interact': await browser.goto(task.url); for (const step of task.steps) { if (step.type === 'click') await browser.click(step.selector); if (step.type === 'type') await browser.type(step.selector, step.value); } return { success: true }; default: return { success: false, error: 'Unknown action' }; } } finally { await browser.close(); } } } ``` ## API Reference ### Session Management | Function | Description | |----------|-------------| | `createSession(options)` | Create new browser session | | `closeSession(sessionId)` | Close browser session | | `listSessions()` | List all active sessions | | `getSession(sessionId)` | Get session info | ### Navigation | Function | Description | |----------|-------------| | `navigate(sessionId, url)` | Navigate to URL | | `back(sessionId)` | Go back | | `forward(sessionId)` | Go forward | | `refresh(sessionId)` | Refresh page | | `getCurrentUrl(sessionId)` | Get current URL | | `getTitle(sessionId)` | Get page title | ### Element Operations | Function | Description | |----------|-------------| | `findElement(sessionId, selector)` | Find single element | | `findElements(sessionId, selector)` | Find multiple elements | | `click(sessionId, selector)` | Click element | | `typeText(sessionId, selector, text, clearFirst?)` | Type into element | | `getText(sessionId, selector)` | Get element text | | `getAttribute(sessionId, selector, attr)` | Get element attribute | | `waitForElement(sessionId, selector, timeout?)` | Wait for element | ### Advanced | Function | Description | |----------|-------------| | `executeScript(sessionId, script, args?)` | Execute JavaScript | | `screenshot(sessionId)` | Take page screenshot | | `elementScreenshot(sessionId, selector)` | Take element screenshot | | `getSource(sessionId)` | Get page HTML source | ### High-Level | Function | Description | |----------|-------------| | `scrapePage(sessionId, selectors, waitFor?, timeout?)` | Scrape multiple selectors | | `fillForm(sessionId, fields, submitSelector?)` | Fill and submit form | ## Configuration ### Environment Variables ```bash # WebDriver URL (default: http://localhost:4444) WEBDRIVER_URL=http://localhost:4444 ``` ### Session Options ```typescript interface SessionOptions { webdriverUrl?: string; // WebDriver server URL headless?: boolean; // Run headless (default: true) browserType?: 'chrome' | 'firefox' | 'edge' | 'safari'; windowWidth?: number; // Window width in pixels windowHeight?: number; // Window height in pixels } ``` ## Troubleshooting ### WebDriver Not Found ``` Error: WebDriver connection failed ``` **Solution**: Ensure ChromeDriver or geckodriver is running: ```bash chromedriver --port=4444 # or geckodriver --port=4444 ``` ### Element Not Found ``` Error: Element not found: .my-selector ``` **Solution**: Use `waitForElement` with appropriate timeout: ```typescript await browser.wait('.my-selector', 10000); ``` ### Session Timeout ``` Error: Session not found ``` **Solution**: Session may have expired. Create a new session. ## Future Enhancements - [ ] WebDriver auto-detection and management - [ ] Built-in ChromeDriver bundling - [ ] Lightpanda integration for high-performance scenarios - [ ] WebMCP integration for Chrome 146+ features - [ ] Screenshot diff comparison - [ ] Network request interception - [ ] Cookie and storage management