User Simulator

The User Simulator enables automated testing of the Potato annotation platform by simulating multiple users with configurable behaviors and competence levels.

Overview

The simulator is useful for: - Quality control testing: Test attention checks, gold standards, and blocking behavior - Dashboard testing: Generate realistic annotation data for admin dashboard - Scalability testing: Stress test the server with many concurrent users - AI assistance evaluation: Compare LLM accuracy against human-like behaviors - Active learning testing: Simulate iterative annotation workflows

Quick Start

# Basic random simulation with 10 users
python -m potato.simulator --server http://localhost:8000 --users 10

# With configuration file
python -m potato.simulator --config simulator-config.yaml --server http://localhost:8000

# Fast scalability test (no waiting between annotations)
python -m potato.simulator --server http://localhost:8000 --users 50 --parallel 10 --fast-mode

Configuration

YAML Configuration File

Create a YAML file with simulator settings:

simulator:
  # User configuration
  users:
    count: 20
    competence_distribution:
      good: 0.5      # 50% will be "good" annotators (80-90% accuracy)
      average: 0.3   # 30% "average" (60-70% accuracy)
      poor: 0.2      # 20% "poor" (40-50% accuracy)

  # Annotation strategy
  strategy: random  # random, biased, llm, pattern, gold_standard

  # Timing configuration
  timing:
    annotation_time:
      min: 2.0
      max: 45.0
      mean: 12.0
      std: 6.0
      distribution: normal  # uniform, normal, exponential

  # Execution
  execution:
    parallel_users: 5
    delay_between_users: 0.5
    max_annotations_per_user: 50

  # Output
  output:
    dir: simulator_output
    format: json

server:
  url: http://localhost:8000

Competence Levels

Level	Accuracy	Description
`perfect`	100%	Always matches gold standard
`good`	80-90%	High-quality annotator
`average`	60-70%	Typical crowdworker
`poor`	40-50%	Low-quality annotator
`random`	~1/N	Random selection from labels
`adversarial`	0%	Intentionally wrong (for testing QC)

Annotation Strategies

Random Strategy (default)

Selects labels uniformly at random:

strategy: random

Biased Strategy

Weighted selection based on label preferences:

strategy: biased
biased_config:
  label_weights:
    positive: 0.6
    negative: 0.3
    neutral: 0.1

LLM Strategy

Uses an LLM to generate annotations based on text content:

strategy: llm
llm_config:
  endpoint_type: openai  # openai, anthropic, ollama, gemini, etc.
  model: gpt-4o-mini
  api_key: ${OPENAI_API_KEY}
  temperature: 0.1
  add_noise: true      # Occasionally add random noise
  noise_rate: 0.05     # 5% of responses will be random

For local LLMs with Ollama:

strategy: llm
llm_config:
  endpoint_type: ollama
  model: llama3.2
  base_url: http://localhost:11434

Pattern Strategy

Consistent per-user behavior patterns:

strategy: pattern
pattern_config:
  patterns:
    user_001:
      preferred_label: positive
      bias_strength: 0.8
      keywords:
        happy: positive
        sad: negative

CLI Options

Usage: python -m potato.simulator [OPTIONS]

Required:
  --server, -s URL        Potato server URL

User Configuration:
  --users, -u NUM         Number of simulated users (default: 10)
  --competence DIST       Competence distribution (e.g., good=0.5,average=0.5)

Strategy:
  --strategy TYPE         Strategy: random, biased, llm, pattern (default: random)
  --bias-weights WEIGHTS  Label weights for biased strategy
  --llm-endpoint TYPE     LLM endpoint: openai, anthropic, ollama, etc.
  --llm-model NAME        LLM model name
  --llm-api-key KEY       LLM API key
  --llm-base-url URL      LLM base URL (for local endpoints)

Execution:
  --parallel, -p NUM      Max concurrent users (default: 5)
  --max-annotations, -m   Max annotations per user
  --sequential            Run users sequentially
  --fast-mode             Disable waiting between annotations

Output:
  --output-dir, -o DIR    Output directory (default: simulator_output)
  --no-export             Don't export results to files

Other:
  --gold-file PATH        Gold standard answers file
  --config, -c PATH       YAML configuration file
  --verbose, -v           Enable debug logging

Working Without Gold Standards

When no gold standards are available: - Competence levels affect consistency but not accuracy measurement - Random strategy selects uniformly from available labels - Biased strategy selects according to configured weights - LLM strategy generates annotations based on text content

To use gold standards for testing accuracy:

python -m potato.simulator --server http://localhost:8000 --gold-file gold_standards.json

Gold standard file format:

[
  {"id": "instance_001", "sentiment": "positive"},
  {"id": "instance_002", "sentiment": "negative"}
]

Quality Control Testing

Test attention check detection:

simulator:
  users:
    count: 10
    competence_distribution:
      adversarial: 1.0  # All users will fail
  quality_control:
    attention_check_fail_rate: 0.5  # 50% fail attention checks
    respond_fast_rate: 0.3          # 30% suspiciously fast responses

Output Files

After simulation, results are exported to the output directory:

summary_{timestamp}.json - Aggregate statistics
user_results_{timestamp}.json - Per-user detailed results
annotations_{timestamp}.csv - All annotations in flat format

Summary Example

{
  "user_count": 20,
  "total_annotations": 400,
  "total_time_seconds": 125.3,
  "attention_checks": {
    "passed": 18,
    "failed": 2,
    "pass_rate": 0.9
  },
  "gold_standards": {
    "correct": 35,
    "incorrect": 5,
    "accuracy": 0.875
  }
}

Programmatic Usage

from potato.simulator import SimulatorManager, SimulatorConfig

# Create configuration
config = SimulatorConfig(
    user_count=10,
    strategy="random",
    competence_distribution={"good": 0.5, "average": 0.5}
)

# Create and run simulator
manager = SimulatorManager(config, "http://localhost:8000")
results = manager.run_parallel(max_annotations_per_user=20)

# Print summary
manager.print_summary()

# Export results
manager.export_results()

Integration with Tests

The simulator can be used in pytest fixtures:

import pytest
from potato.simulator import SimulatorManager, SimulatorConfig

@pytest.fixture
def simulated_annotations(flask_test_server):
    """Generate simulated annotations for testing."""
    config = SimulatorConfig(user_count=5, strategy="random")
    manager = SimulatorManager(config, flask_test_server.base_url)
    return manager.run_parallel(max_annotations_per_user=10)

def test_dashboard_shows_annotations(simulated_annotations, flask_test_server):
    """Verify dashboard shows simulated data."""
    # Check admin API
    response = requests.get(f"{flask_test_server.base_url}/admin/api/overview")
    assert response.json()["total_annotations"] > 0

Example Configurations

See example configuration files in: - examples/simulator-configs/simulator-random.yaml - examples/simulator-configs/simulator-biased.yaml - examples/simulator-configs/simulator-ollama.yaml

Troubleshooting

Ensure the server allows anonymous registration or has require_password: false
Check server logs for authentication errors

No instances available

Verify data files are loaded correctly
Check assignment strategy settings

LLM strategy not working

Verify API key is set (via config or environment variable)
For Ollama, ensure the server is running at the configured URL
Check model name is correct