Skip to content

Automation Rules

The automation engine closes the production → eval loop: a programmable filter → sampling rate → actions pipeline runs over every item entering Potato — whether loaded from data_files or ingested at runtime via the trace webhook / tracing SDK. Each matching rule can route the item to the annotation queue, curate it into an eval dataset, run an evaluator, fire an outbound webhook, or notify annotators.

Enabling

automation:
  enabled: true
  rules:
    - name: route-errors
      when: {field: status, in: [error, failed]}
      sample_rate: 1.0          # 0.0–1.0 (default 1.0 = every match)
      actions:
        - {type: add_to_queue, priority: 100, reason: "Agent errored"}
        - {type: add_to_dataset, dataset: errors-to-fix}
        - {type: run_evaluator, evaluator: trajectory_match}
        - {type: fire_webhook, url: "https://example.com/hook"}
        - {type: notify, message: "New error trace"}

Rules

A rule fires for an item when both:

  1. when matches — the shared condition grammar (same as triage): equals, in, contains, exists, lt/lte/gt/gte, dotted field paths (metadata.score). A list of conditions is AND-ed; an empty/absent when matches everything.
  2. sample_rate selects it — deterministic sampling on a hash of (item id, rule name), so re-processing the same item yields the same decision (idempotent, replay-safe). 1.0 = always, 0.0 = never.

Common fields on an ingested trace: metadata.source (webhook/langsmith/langfuse), task_description, plus any top-level fields your payload includes that survive normalization.

Actions

Action When it runs Effect
add_to_queue inline (fast) Boost the item's triage priority so the priority assignment strategy surfaces it. Params: priority, reason.
add_to_dataset inline (fast) Append the item as an example to a dataset (created if absent). Params: dataset.
notify inline (fast) Notify connected annotators via SSE. Params: message.
run_evaluator background worker Score the item with an evaluator; the score is stored on the item (metadata.automation_eval). Params: evaluator, params.
fire_webhook background worker POST {rule, item_id, item_data} to an external URL. Params: url, headers.

Fast actions run inline in the ingestion path (cheap, in-process). Heavy actions (run_evaluator, fire_webhook) are dispatched to a background worker so ingestion never blocks. Every action records an outcome; failures are caught and logged as error outcomes — automation never breaks ingestion.

Ordering note: actions within a rule run in listed order, but heavy actions complete asynchronously, so a fire_webhook may finish after a later inline action. (Mirrors LangSmith's per-rule scheduling caveat.)

Inspecting

The admin dashboard links to Automation (/admin/automation), showing configured rules, activity counters, and recent action outcomes. JSON API:

Path Returns
GET /admin/automation/status rules + counters + per-action breakdown
GET /admin/automation/outcomes?limit=N recent action outcomes

Example

examples/agent-traces/automation-loop/ is a runnable demo:

python potato/flask_server.py start examples/agent-traces/automation-loop/config.yaml -p 8000
curl -X POST http://localhost:8000/api/traces/webhook -H "Content-Type: application/json" \
  -d '{"id":"run-1","task_description":"buy milk","status":"error","steps":[{"action_type":"click"}]}'