User Simulator
The User Simulator enables automated testing of the Potato annotation platform by simulating multiple users with configurable behaviors and competence levels.
Overview
The simulator is useful for: - Quality control testing: Test attention checks, gold standards, and blocking behavior - Dashboard testing: Generate realistic annotation data for admin dashboard - Scalability testing: Stress test the server with many concurrent users - AI assistance evaluation: Compare LLM accuracy against human-like behaviors - Active learning testing: Simulate iterative annotation workflows
Quick Start
# Basic random simulation with 10 users
python -m potato.simulator --server http://localhost:8000 --users 10
# With configuration file
python -m potato.simulator --config simulator-config.yaml --server http://localhost:8000
# Fast scalability test (no waiting between annotations)
python -m potato.simulator --server http://localhost:8000 --users 50 --parallel 10 --fast-mode
Configuration
YAML Configuration File
Create a YAML file with simulator settings:
simulator:
# User configuration
users:
count: 20
competence_distribution:
good: 0.5 # 50% will be "good" annotators (80-90% accuracy)
average: 0.3 # 30% "average" (60-70% accuracy)
poor: 0.2 # 20% "poor" (40-50% accuracy)
# Annotation strategy
strategy: random # random, biased, llm, pattern, gold_standard
# Timing configuration
timing:
annotation_time:
min: 2.0
max: 45.0
mean: 12.0
std: 6.0
distribution: normal # uniform, normal, exponential
# Execution
execution:
parallel_users: 5
delay_between_users: 0.5
max_annotations_per_user: 50
# Output
output:
dir: simulator_output
format: json
server:
url: http://localhost:8000
Competence Levels
| Level | Accuracy | Description |
|---|---|---|
perfect |
100% | Always matches gold standard |
good |
80-90% | High-quality annotator |
average |
60-70% | Typical crowdworker |
poor |
40-50% | Low-quality annotator |
random |
~1/N | Random selection from labels |
adversarial |
0% | Intentionally wrong (for testing QC) |
Annotation Strategies
Random Strategy (default)
Selects labels uniformly at random:
strategy: random
Biased Strategy
Weighted selection based on label preferences:
strategy: biased
biased_config:
label_weights:
positive: 0.6
negative: 0.3
neutral: 0.1
LLM Strategy
Uses an LLM to generate annotations based on text content:
strategy: llm
llm_config:
endpoint_type: openai # openai, anthropic, ollama, gemini, etc.
model: gpt-4o-mini
api_key: ${OPENAI_API_KEY}
temperature: 0.1
add_noise: true # Occasionally add random noise
noise_rate: 0.05 # 5% of responses will be random
For local LLMs with Ollama:
strategy: llm
llm_config:
endpoint_type: ollama
model: llama3.2
base_url: http://localhost:11434
Pattern Strategy
Consistent per-user behavior patterns:
strategy: pattern
pattern_config:
patterns:
user_001:
preferred_label: positive
bias_strength: 0.8
keywords:
happy: positive
sad: negative
CLI Options
Usage: python -m potato.simulator [OPTIONS]
Required:
--server, -s URL Potato server URL
User Configuration:
--users, -u NUM Number of simulated users (default: 10)
--competence DIST Competence distribution (e.g., good=0.5,average=0.5)
Strategy:
--strategy TYPE Strategy: random, biased, llm, pattern (default: random)
--bias-weights WEIGHTS Label weights for biased strategy
--llm-endpoint TYPE LLM endpoint: openai, anthropic, ollama, etc.
--llm-model NAME LLM model name
--llm-api-key KEY LLM API key
--llm-base-url URL LLM base URL (for local endpoints)
Execution:
--parallel, -p NUM Max concurrent users (default: 5)
--max-annotations, -m Max annotations per user
--sequential Run users sequentially
--fast-mode Disable waiting between annotations
Output:
--output-dir, -o DIR Output directory (default: simulator_output)
--no-export Don't export results to files
Other:
--gold-file PATH Gold standard answers file
--config, -c PATH YAML configuration file
--verbose, -v Enable debug logging
Working Without Gold Standards
When no gold standards are available: - Competence levels affect consistency but not accuracy measurement - Random strategy selects uniformly from available labels - Biased strategy selects according to configured weights - LLM strategy generates annotations based on text content
To use gold standards for testing accuracy:
python -m potato.simulator --server http://localhost:8000 --gold-file gold_standards.json
Gold standard file format:
[
{"id": "instance_001", "sentiment": "positive"},
{"id": "instance_002", "sentiment": "negative"}
]
Quality Control Testing
Test attention check detection:
simulator:
users:
count: 10
competence_distribution:
adversarial: 1.0 # All users will fail
quality_control:
attention_check_fail_rate: 0.5 # 50% fail attention checks
respond_fast_rate: 0.3 # 30% suspiciously fast responses
Output Files
After simulation, results are exported to the output directory:
summary_{timestamp}.json- Aggregate statisticsuser_results_{timestamp}.json- Per-user detailed resultsannotations_{timestamp}.csv- All annotations in flat format
Summary Example
{
"user_count": 20,
"total_annotations": 400,
"total_time_seconds": 125.3,
"attention_checks": {
"passed": 18,
"failed": 2,
"pass_rate": 0.9
},
"gold_standards": {
"correct": 35,
"incorrect": 5,
"accuracy": 0.875
}
}
Programmatic Usage
from potato.simulator import SimulatorManager, SimulatorConfig
# Create configuration
config = SimulatorConfig(
user_count=10,
strategy="random",
competence_distribution={"good": 0.5, "average": 0.5}
)
# Create and run simulator
manager = SimulatorManager(config, "http://localhost:8000")
results = manager.run_parallel(max_annotations_per_user=20)
# Print summary
manager.print_summary()
# Export results
manager.export_results()
Integration with Tests
The simulator can be used in pytest fixtures:
import pytest
from potato.simulator import SimulatorManager, SimulatorConfig
@pytest.fixture
def simulated_annotations(flask_test_server):
"""Generate simulated annotations for testing."""
config = SimulatorConfig(user_count=5, strategy="random")
manager = SimulatorManager(config, flask_test_server.base_url)
return manager.run_parallel(max_annotations_per_user=10)
def test_dashboard_shows_annotations(simulated_annotations, flask_test_server):
"""Verify dashboard shows simulated data."""
# Check admin API
response = requests.get(f"{flask_test_server.base_url}/admin/api/overview")
assert response.json()["total_annotations"] > 0
Example Configurations
See example configuration files in:
- examples/simulator-configs/simulator-random.yaml
- examples/simulator-configs/simulator-biased.yaml
- examples/simulator-configs/simulator-ollama.yaml
Troubleshooting
Login failures
- Ensure the server allows anonymous registration or has
require_password: false - Check server logs for authentication errors
No instances available
- Verify data files are loaded correctly
- Check assignment strategy settings
LLM strategy not working
- Verify API key is set (via config or environment variable)
- For Ollama, ensure the server is running at the configured URL
- Check model name is correct