Files
zclaw_openfang/docs/features/02-intelligence-layer/browser-automation-integration.md
iven f4efc823e2 refactor(types): comprehensive TypeScript type system improvements
Major type system refactoring and error fixes across the codebase:

**Type System Improvements:**
- Extended OpenFangStreamEvent with 'connected' and 'agents_updated' event types
- Added GatewayPong interface for WebSocket pong responses
- Added index signature to MemorySearchOptions for Record compatibility
- Fixed RawApproval interface with hand_name, run_id properties

**Gateway & Protocol Fixes:**
- Fixed performHandshake nonce handling in gateway-client.ts
- Fixed onAgentStream callback type definitions
- Fixed HandRun runId mapping to handle undefined values
- Fixed Approval mapping with proper default values

**Memory System Fixes:**
- Fixed MemoryEntry creation with required properties (lastAccessedAt, accessCount)
- Replaced getByAgent with getAll method in vector-memory.ts
- Fixed MemorySearchOptions type compatibility

**Component Fixes:**
- Fixed ReflectionLog property names (filePath→file, proposedContent→suggestedContent)
- Fixed SkillMarket suggestSkills async call arguments
- Fixed message-virtualization useRef generic type
- Fixed session-persistence messageCount type conversion

**Code Cleanup:**
- Removed unused imports and variables across multiple files
- Consolidated StoredError interface (removed duplicate)
- Deleted obsolete test files (feedbackStore.test.ts, memory-index.test.ts)

**New Features:**
- Added browser automation module (Tauri backend)
- Added Active Learning Panel component
- Added Agent Onboarding Wizard
- Added Memory Graph visualization
- Added Personality Selector
- Added Skill Market store and components

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 08:05:07 +08:00

10 KiB

Browser Automation Integration Guide

Overview

ZCLAW now includes browser automation capabilities powered by Fantoccini (Rust WebDriver client). This enables the Browser Hand to automate web browsers for testing, scraping, and automation tasks.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Frontend (React)                        │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  browser-client.ts                                   │   │
│  │  - createSession() / closeSession()                 │   │
│  │  - navigate() / click() / type()                    │   │
│  │  - screenshot() / scrapePage()                      │   │
│  └─────────────────────┬───────────────────────────────┘   │
└────────────────────────┼────────────────────────────────────┘
                         │ Tauri invoke()
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                    Tauri Backend (Rust)                      │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  browser/commands.rs                                 │   │
│  │  - Tauri command handlers                           │   │
│  └─────────────────────┬───────────────────────────────┘   │
│                        │                                    │
│  ┌─────────────────────▼───────────────────────────────┐   │
│  │  browser/client.rs                                   │   │
│  │  - BrowserClient (WebDriver connection)             │   │
│  │  - Session management                               │   │
│  │  - Element operations                               │   │
│  └─────────────────────┬───────────────────────────────┘   │
│                        │                                    │
│  ┌─────────────────────▼───────────────────────────────┐   │
│  │  Fantoccini (WebDriver Protocol)                     │   │
│  └─────────────────────┬───────────────────────────────┘   │
└────────────────────────┼────────────────────────────────────┘
                         │ WebDriver Protocol
                         ▼
┌─────────────────────────────────────────────────────────────┐
│              ChromeDriver / GeckoDriver                      │
│              (Requires separate installation)                │
└─────────────────────────────────────────────────────────────┘

Prerequisites

1. Install WebDriver

You need a WebDriver installed and running:

# Chrome (ChromeDriver)
# Download from: https://chromedriver.chromium.org/
chromedriver --port=4444

# Firefox (geckodriver)
# Download from: https://github.com/mozilla/geckodriver
geckodriver --port=4444

2. Verify WebDriver is Running

curl http://localhost:4444/status

Usage Examples

Basic Usage (Functional API)

import { createSession, navigate, click, screenshot, closeSession } from './lib/browser-client';

async function example() {
  // Create session
  const { session_id } = await createSession({
    headless: true,
    browserType: 'chrome',
  });

  try {
    // Navigate
    await navigate(session_id, 'https://example.com');

    // Click element
    await click(session_id, 'button.submit');

    // Take screenshot
    const { base64 } = await screenshot(session_id);
    console.log('Screenshot taken, size:', base64.length);

  } finally {
    // Always close session
    await closeSession(session_id);
  }
}
import Browser from './lib/browser-client';

async function scrapeData() {
  const browser = new Browser();

  try {
    // Start browser
    await browser.start({ headless: true });

    // Navigate
    await browser.goto('https://example.com/products');

    // Wait for products to load
    await browser.wait('.product-list', 5000);

    // Scrape product data
    const data = await browser.scrape(
      ['.product-name', '.product-price', '.product-description'],
      '.product-list'
    );

    console.log('Products:', data);

  } finally {
    await browser.close();
  }
}

Form Filling

import Browser from './lib/browser-client';

async function fillForm() {
  const browser = new Browser();

  try {
    await browser.start();
    await browser.goto('https://example.com/login');

    // Fill login form
    await browser.fillForm([
      { selector: 'input[name="email"]', value: 'user@example.com' },
      { selector: 'input[name="password"]', value: 'password123' },
    ], 'button[type="submit"]');

    // Wait for redirect
    await browser.wait('.dashboard', 5000);

    // Take screenshot of logged-in state
    const { base64 } = await browser.screenshot();

  } finally {
    await browser.close();
  }
}

Integration with Hands System

// In your Hand implementation
import Browser from '../lib/browser-client';

export class BrowserHand implements Hand {
  name = 'browser';
  description = 'Automates web browser interactions';

  async execute(task: BrowserTask): Promise<HandResult> {
    const browser = new Browser();

    try {
      await browser.start({ headless: true });

      switch (task.action) {
        case 'scrape':
          await browser.goto(task.url);
          return { success: true, data: await browser.scrape(task.selectors) };

        case 'screenshot':
          await browser.goto(task.url);
          return { success: true, data: await browser.screenshot() };

        case 'interact':
          await browser.goto(task.url);
          for (const step of task.steps) {
            if (step.type === 'click') await browser.click(step.selector);
            if (step.type === 'type') await browser.type(step.selector, step.value);
          }
          return { success: true };

        default:
          return { success: false, error: 'Unknown action' };
      }
    } finally {
      await browser.close();
    }
  }
}

API Reference

Session Management

Function Description
createSession(options) Create new browser session
closeSession(sessionId) Close browser session
listSessions() List all active sessions
getSession(sessionId) Get session info

Navigation

Function Description
navigate(sessionId, url) Navigate to URL
back(sessionId) Go back
forward(sessionId) Go forward
refresh(sessionId) Refresh page
getCurrentUrl(sessionId) Get current URL
getTitle(sessionId) Get page title

Element Operations

Function Description
findElement(sessionId, selector) Find single element
findElements(sessionId, selector) Find multiple elements
click(sessionId, selector) Click element
typeText(sessionId, selector, text, clearFirst?) Type into element
getText(sessionId, selector) Get element text
getAttribute(sessionId, selector, attr) Get element attribute
waitForElement(sessionId, selector, timeout?) Wait for element

Advanced

Function Description
executeScript(sessionId, script, args?) Execute JavaScript
screenshot(sessionId) Take page screenshot
elementScreenshot(sessionId, selector) Take element screenshot
getSource(sessionId) Get page HTML source

High-Level

Function Description
scrapePage(sessionId, selectors, waitFor?, timeout?) Scrape multiple selectors
fillForm(sessionId, fields, submitSelector?) Fill and submit form

Configuration

Environment Variables

# WebDriver URL (default: http://localhost:4444)
WEBDRIVER_URL=http://localhost:4444

Session Options

interface SessionOptions {
  webdriverUrl?: string;      // WebDriver server URL
  headless?: boolean;         // Run headless (default: true)
  browserType?: 'chrome' | 'firefox' | 'edge' | 'safari';
  windowWidth?: number;       // Window width in pixels
  windowHeight?: number;      // Window height in pixels
}

Troubleshooting

WebDriver Not Found

Error: WebDriver connection failed

Solution: Ensure ChromeDriver or geckodriver is running:

chromedriver --port=4444
# or
geckodriver --port=4444

Element Not Found

Error: Element not found: .my-selector

Solution: Use waitForElement with appropriate timeout:

await browser.wait('.my-selector', 10000);

Session Timeout

Error: Session not found

Solution: Session may have expired. Create a new session.

Future Enhancements

  • WebDriver auto-detection and management
  • Built-in ChromeDriver bundling
  • Lightpanda integration for high-performance scenarios
  • WebMCP integration for Chrome 146+ features
  • Screenshot diff comparison
  • Network request interception
  • Cookie and storage management