Web Agent Annotation

Potato supports both reviewing and creating web agent browsing traces through an interactive interface with SVG overlay visualizations.

Overview

Web agent annotation provides two modes:

Review Mode — View pre-recorded agent browsing traces step-by-step with screenshot overlays showing clicks, bounding boxes, mouse paths, and scroll indicators. Annotators evaluate agent behavior using per-step and trajectory-level annotation schemes.
Creation Mode — Browse websites through a proxied iframe while interactions are automatically recorded to build new agent traces.

Review Mode

Configuration

instance_display:
  fields:
    - key: steps
      type: web_agent_trace
      label: "Agent Browsing Trace"
      display_options:
        show_overlays: true         # Enable SVG overlays (default: true)
        show_filmstrip: true        # Show thumbnail filmstrip bar (default: true)
        show_thought: true          # Show agent's reasoning (default: true)
        show_observation: true      # Show environment observations (default: true)
        show_element_info: true     # Show target element details (default: true)
        screenshot_max_width: 800   # Max screenshot width in pixels (default: 800)
        screenshot_max_height: 600  # Max screenshot height in pixels (default: 600)
        filmstrip_size: 80          # Thumbnail size in pixels (default: 80)

Data Format

Each instance should contain a steps array with step-level data:

{
  "id": "trace_001",
  "task_description": "Find and add a blue wool sweater to cart",
  "site": "amazon.com",
  "steps": [
    {
      "step_index": 0,
      "screenshot_url": "screenshots/step_000.png",
      "action_type": "click",
      "element": {
        "tag": "input",
        "text": "Search",
        "bbox": [340, 45, 680, 75]
      },
      "coordinates": {"x": 510, "y": 60},
      "mouse_path": [[200, 300], [350, 200], [510, 60]],
      "thought": "I need to search for blue wool sweaters",
      "observation": "Search box is focused",
      "timestamp": 1.2,
      "viewport": {"width": 1280, "height": 720}
    }
  ]
}

Supported Action Types

Action	Description	Overlay
`click`	Mouse click on element	Red circle + crosshair
`type`	Text input	Yellow highlight on target
`scroll`	Page scroll	Green directional arrow
`hover`	Mouse hover	Purple circle
`select`	Dropdown selection	Blue bounding box
`navigate`	URL navigation	—
`wait`	Waiting for page load	—
`done`	Task completion	—

SVG Overlays

The viewer renders SVG overlays on top of screenshots:

Click markers — Red circle with crosshair and pulse animation at click coordinates
Bounding boxes — Blue dashed rectangle around the target element's bounding box
Mouse paths — Orange curved line showing the mouse trajectory with animated dash
Scroll indicators — Green arrow showing scroll direction and magnitude

Keyboard Shortcuts

When the viewer is focused (click on it):

Key	Action
`←` / `→`	Previous / Next step
`1`	Toggle click marker overlays
`2`	Toggle bounding box overlays
`3`	Toggle mouse path overlays
`4`	Toggle scroll indicator overlays
`A`	Show all overlays
`N`	Hide all overlays

Per-Step Annotations

Add per_step: true to an annotation scheme to create per-step annotations that appear inline with each step:

annotation_schemes:
  - annotation_type: radio
    name: step_correctness
    per_step: true
    labels:
      - name: correct
      - name: incorrect
      - name: unnecessary

Per-step annotations are stored as {scheme_name}_step_{index} (e.g., step_correctness_step_0).

Converting Traces

Use the web agent converter to transform various formats:

# Convert from file
python -m potato.trace_converter -i traces.json -f web_agent -o output.jsonl

# Auto-detect format
python -m potato.trace_converter -i traces.json --auto-detect -o output.jsonl

Supported input formats: - WebArena/VisualWebArena — action_type + element in action steps - Mind2Web — operation + target_html in action steps - Anthropic Computer Use — Tool blocks with computer_20241022 type - Raw recordings — Steps with mouse_path + viewport data

Creation Mode

Configuration

instance_display:
  fields:
    - key: browsing_session
      type: web_agent_recorder
      display_options:
        start_url: "https://www.google.com"
        proxy_mode: auto          # auto, iframe, playwright
        record_mouse_path: true
        record_viewport: true
        screenshot_method: server
        max_steps: 50

How It Works

The annotator sees a task description and a browser iframe
They browse the website while their interactions are recorded
Clicks, typing, scrolling, and mouse movements are captured
Screenshots are taken at each step
The recording is saved as a structured trace

Proxy Modes

auto (default) — Automatically detects if the target site allows iframe embedding. Uses iframe proxy if allowed, falls back to Playwright if not.
iframe — Forces iframe proxy mode. Works for ~90% of sites. Fast with <100ms overhead.
playwright — Forces server-side Playwright mode. Works for 100% of sites. Requires playwright package installation.

Playwright Setup (Optional)

For sites that block iframes:

pip install playwright
playwright install chromium

Example Projects

Review Mode

python potato/flask_server.py start examples/agent-traces/web-agent-review/config.yaml -p 8000

Creation Mode

python potato/flask_server.py start examples/agent-traces/web-agent-creation/config.yaml -p 8000

API Endpoints

Recording API

Endpoint	Method	Description
`/api/web_agent/start_session`	POST	Start recording session
`/api/web_agent/save_step`	POST	Save a recorded step
`/api/web_agent/save_screenshot`	POST	Upload step screenshot
`/api/web_agent/end_session`	POST	End session, save trace
`/api/web_agent/proxy/<url>`	GET	Proxy external URL
`/api/web_agent/check_frameable`	GET	Check iframe compatibility

Agent Trace Display — Standard agent trace step cards
Configuration Reference — Full configuration options
Trace Converters — Converting between trace formats

Web Agent Annotation

Overview

Review Mode

Configuration

Data Format

Supported Action Types

SVG Overlays

Keyboard Shortcuts

Per-Step Annotations

Converting Traces

Creation Mode

Configuration

How It Works

Proxy Modes

Playwright Setup (Optional)

Example Projects

Review Mode

Creation Mode

API Endpoints

Recording API

Related Documentation