Major type system refactoring and error fixes across the codebase: **Type System Improvements:** - Extended OpenFangStreamEvent with 'connected' and 'agents_updated' event types - Added GatewayPong interface for WebSocket pong responses - Added index signature to MemorySearchOptions for Record compatibility - Fixed RawApproval interface with hand_name, run_id properties **Gateway & Protocol Fixes:** - Fixed performHandshake nonce handling in gateway-client.ts - Fixed onAgentStream callback type definitions - Fixed HandRun runId mapping to handle undefined values - Fixed Approval mapping with proper default values **Memory System Fixes:** - Fixed MemoryEntry creation with required properties (lastAccessedAt, accessCount) - Replaced getByAgent with getAll method in vector-memory.ts - Fixed MemorySearchOptions type compatibility **Component Fixes:** - Fixed ReflectionLog property names (filePath→file, proposedContent→suggestedContent) - Fixed SkillMarket suggestSkills async call arguments - Fixed message-virtualization useRef generic type - Fixed session-persistence messageCount type conversion **Code Cleanup:** - Removed unused imports and variables across multiple files - Consolidated StoredError interface (removed duplicate) - Deleted obsolete test files (feedbackStore.test.ts, memory-index.test.ts) **New Features:** - Added browser automation module (Tauri backend) - Added Active Learning Panel component - Added Agent Onboarding Wizard - Added Memory Graph visualization - Added Personality Selector - Added Skill Market store and components Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
10 KiB
Browser Automation Integration Guide
Overview
ZCLAW now includes browser automation capabilities powered by Fantoccini (Rust WebDriver client). This enables the Browser Hand to automate web browsers for testing, scraping, and automation tasks.
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Frontend (React) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ browser-client.ts │ │
│ │ - createSession() / closeSession() │ │
│ │ - navigate() / click() / type() │ │
│ │ - screenshot() / scrapePage() │ │
│ └─────────────────────┬───────────────────────────────┘ │
└────────────────────────┼────────────────────────────────────┘
│ Tauri invoke()
▼
┌─────────────────────────────────────────────────────────────┐
│ Tauri Backend (Rust) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ browser/commands.rs │ │
│ │ - Tauri command handlers │ │
│ └─────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────────▼───────────────────────────────┐ │
│ │ browser/client.rs │ │
│ │ - BrowserClient (WebDriver connection) │ │
│ │ - Session management │ │
│ │ - Element operations │ │
│ └─────────────────────┬───────────────────────────────┘ │
│ │ │
│ ┌─────────────────────▼───────────────────────────────┐ │
│ │ Fantoccini (WebDriver Protocol) │ │
│ └─────────────────────┬───────────────────────────────┘ │
└────────────────────────┼────────────────────────────────────┘
│ WebDriver Protocol
▼
┌─────────────────────────────────────────────────────────────┐
│ ChromeDriver / GeckoDriver │
│ (Requires separate installation) │
└─────────────────────────────────────────────────────────────┘
Prerequisites
1. Install WebDriver
You need a WebDriver installed and running:
# Chrome (ChromeDriver)
# Download from: https://chromedriver.chromium.org/
chromedriver --port=4444
# Firefox (geckodriver)
# Download from: https://github.com/mozilla/geckodriver
geckodriver --port=4444
2. Verify WebDriver is Running
curl http://localhost:4444/status
Usage Examples
Basic Usage (Functional API)
import { createSession, navigate, click, screenshot, closeSession } from './lib/browser-client';
async function example() {
// Create session
const { session_id } = await createSession({
headless: true,
browserType: 'chrome',
});
try {
// Navigate
await navigate(session_id, 'https://example.com');
// Click element
await click(session_id, 'button.submit');
// Take screenshot
const { base64 } = await screenshot(session_id);
console.log('Screenshot taken, size:', base64.length);
} finally {
// Always close session
await closeSession(session_id);
}
}
Using Browser Class (Recommended)
import Browser from './lib/browser-client';
async function scrapeData() {
const browser = new Browser();
try {
// Start browser
await browser.start({ headless: true });
// Navigate
await browser.goto('https://example.com/products');
// Wait for products to load
await browser.wait('.product-list', 5000);
// Scrape product data
const data = await browser.scrape(
['.product-name', '.product-price', '.product-description'],
'.product-list'
);
console.log('Products:', data);
} finally {
await browser.close();
}
}
Form Filling
import Browser from './lib/browser-client';
async function fillForm() {
const browser = new Browser();
try {
await browser.start();
await browser.goto('https://example.com/login');
// Fill login form
await browser.fillForm([
{ selector: 'input[name="email"]', value: 'user@example.com' },
{ selector: 'input[name="password"]', value: 'password123' },
], 'button[type="submit"]');
// Wait for redirect
await browser.wait('.dashboard', 5000);
// Take screenshot of logged-in state
const { base64 } = await browser.screenshot();
} finally {
await browser.close();
}
}
Integration with Hands System
// In your Hand implementation
import Browser from '../lib/browser-client';
export class BrowserHand implements Hand {
name = 'browser';
description = 'Automates web browser interactions';
async execute(task: BrowserTask): Promise<HandResult> {
const browser = new Browser();
try {
await browser.start({ headless: true });
switch (task.action) {
case 'scrape':
await browser.goto(task.url);
return { success: true, data: await browser.scrape(task.selectors) };
case 'screenshot':
await browser.goto(task.url);
return { success: true, data: await browser.screenshot() };
case 'interact':
await browser.goto(task.url);
for (const step of task.steps) {
if (step.type === 'click') await browser.click(step.selector);
if (step.type === 'type') await browser.type(step.selector, step.value);
}
return { success: true };
default:
return { success: false, error: 'Unknown action' };
}
} finally {
await browser.close();
}
}
}
API Reference
Session Management
| Function | Description |
|---|---|
createSession(options) |
Create new browser session |
closeSession(sessionId) |
Close browser session |
listSessions() |
List all active sessions |
getSession(sessionId) |
Get session info |
Navigation
| Function | Description |
|---|---|
navigate(sessionId, url) |
Navigate to URL |
back(sessionId) |
Go back |
forward(sessionId) |
Go forward |
refresh(sessionId) |
Refresh page |
getCurrentUrl(sessionId) |
Get current URL |
getTitle(sessionId) |
Get page title |
Element Operations
| Function | Description |
|---|---|
findElement(sessionId, selector) |
Find single element |
findElements(sessionId, selector) |
Find multiple elements |
click(sessionId, selector) |
Click element |
typeText(sessionId, selector, text, clearFirst?) |
Type into element |
getText(sessionId, selector) |
Get element text |
getAttribute(sessionId, selector, attr) |
Get element attribute |
waitForElement(sessionId, selector, timeout?) |
Wait for element |
Advanced
| Function | Description |
|---|---|
executeScript(sessionId, script, args?) |
Execute JavaScript |
screenshot(sessionId) |
Take page screenshot |
elementScreenshot(sessionId, selector) |
Take element screenshot |
getSource(sessionId) |
Get page HTML source |
High-Level
| Function | Description |
|---|---|
scrapePage(sessionId, selectors, waitFor?, timeout?) |
Scrape multiple selectors |
fillForm(sessionId, fields, submitSelector?) |
Fill and submit form |
Configuration
Environment Variables
# WebDriver URL (default: http://localhost:4444)
WEBDRIVER_URL=http://localhost:4444
Session Options
interface SessionOptions {
webdriverUrl?: string; // WebDriver server URL
headless?: boolean; // Run headless (default: true)
browserType?: 'chrome' | 'firefox' | 'edge' | 'safari';
windowWidth?: number; // Window width in pixels
windowHeight?: number; // Window height in pixels
}
Troubleshooting
WebDriver Not Found
Error: WebDriver connection failed
Solution: Ensure ChromeDriver or geckodriver is running:
chromedriver --port=4444
# or
geckodriver --port=4444
Element Not Found
Error: Element not found: .my-selector
Solution: Use waitForElement with appropriate timeout:
await browser.wait('.my-selector', 10000);
Session Timeout
Error: Session not found
Solution: Session may have expired. Create a new session.
Future Enhancements
- WebDriver auto-detection and management
- Built-in ChromeDriver bundling
- Lightpanda integration for high-performance scenarios
- WebMCP integration for Chrome 146+ features
- Screenshot diff comparison
- Network request interception
- Cookie and storage management