Crowdsourcing Integration
Potato can be seamlessly deployed online to collect annotations from common crowdsourcing platforms like Prolific and Amazon Mechanical Turk.
Setup Potato on a Server
To run Potato in a crowdsourcing setup, you need to deploy Potato on a server with open ports (ports accessible via the open internet). When you start the Potato server, simply change the default port to an openly accessible port:
potato start your-project -p 8080
You should then be able to access the annotation page via your_ip_address:the_port.
Prolific Integration
Prolific is a platform where you can easily recruit task participants. Potato provides seamless integration with Prolific through:
- URL-Direct Login: Automatic login using Prolific's URL parameters
- Completion Codes: Built-in support for completion code display and redirect
- Prolific API Integration: Optional automatic study management via Prolific's API
Quick Start: Minimal Prolific Setup
For a basic Prolific integration, you only need to add two configuration options to your YAML file:
# Enable URL-direct login (extracts PROLIFIC_PID from URL)
login:
type: url_direct
url_argument: PROLIFIC_PID # Optional, this is the default
# Your Prolific completion code
completion_code: "YOUR-PROLIFIC-CODE"
With this configuration:
- Workers arriving from Prolific are automatically logged in using their PROLIFIC_PID
- No password is required (automatically disabled for URL-direct login)
- When workers complete all tasks, they see a completion page with:
- The completion code displayed prominently (click to copy)
- A "Return to Prolific" button that redirects them to submit their completion
URL-Direct Login
When workers click your study link on Prolific, they arrive at a URL like:
https://your-server:8080/?PROLIFIC_PID=abc123&SESSION_ID=xyz789&STUDY_ID=study456
Potato automatically: 1. Extracts the worker ID from the URL parameter 2. Creates a session for the worker (no login form needed) 3. Initializes their annotation state 4. Routes them to the first phase (consent, instructions, or annotation)
Configuration Options
login:
type: url_direct # or "prolific" for full API integration
url_argument: PROLIFIC_PID # URL parameter name to extract username from
# Can be customized for MTurk (workerId) etc.
Supported Login Types
| Type | Description |
|---|---|
standard |
Default login form (username/password) |
url_direct |
Extract username from URL parameter |
prolific |
URL-direct login + Prolific API integration |
Completion Code Setup
Potato provides a built-in completion page that displays your Prolific completion code:
# Simple completion code (shown on done page)
completion_code: "YOUR-PROLIFIC-CODE"
# Optional: Auto-redirect to Prolific after completion
auto_redirect_on_completion: true
auto_redirect_delay: 5000 # milliseconds (default: 5000)
The completion page features:
- Large, prominent display of the completion code
- Click-to-copy functionality
- "Return to Prolific" button that redirects to https://app.prolific.com/submissions/complete?cc=YOUR-CODE
- Optional auto-redirect after a configurable delay
Full Configuration Example
Here's a complete configuration for a Prolific study:
# Task identification
annotation_task_name: "My Prolific Study"
# Prolific login settings
login:
type: url_direct
url_argument: PROLIFIC_PID
# Completion settings
completion_code: "ABC123XYZ"
auto_redirect_on_completion: false # Set to true for auto-redirect
# Hide navigation to prevent workers from skipping
hide_navbar: true
jumping_to_id_disabled: true
# Phase configuration
phases:
order:
- consent
- instructions
- annotation
consent:
file: data/consent.html
instructions:
file: data/instructions.html
annotation:
type: annotation
# Automatic task assignment for crowdsourcing
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
# Data files
data_files:
- data/items.json
# Output
output_annotation_dir: annotation_output/
Advanced: Prolific API Integration
For larger studies, Potato can integrate with Prolific's API to automatically manage your study:
- Release slots from returned/timed-out/rejected workers
- Pause the study when server load is high
- Resume when load decreases
- Automatically pause when the study is complete
When Prolific submission status is refreshed, Potato also releases any
assigned-but-unannotated items from workers whose submissions are RETURNED,
TIMED-OUT, or REJECTED, so abandoned batches can be assigned to other workers.
Annotations already completed by those workers are preserved.
If your study should treat dropout reasons differently, configure
instance_reclaim. For example, preserve completed annotations from timed-out
workers but discard annotations from rejected workers:
instance_reclaim:
enabled: true
prolific:
preserve_completed_annotations: true
status_policies:
TIMED-OUT:
preserve_completed_annotations: true
REJECTED:
preserve_completed_annotations: false
Setup Prolific API
-
Get your Prolific API token from your Prolific account settings
-
Create a Prolific config file (
configs/prolific_config.yaml):
token: "your-prolific-api-token"
study_id: "your-study-id"
max_concurrent_sessions: 30 # Maximum concurrent workers
workload_checker_period: 300 # Seconds between load checks
- Reference it in your main config:
login:
type: prolific # Enables both URL-direct login and API integration
prolific:
config_file_path: configs/prolific_config.yaml
completion_code: "YOUR-PROLIFIC-CODE"
Server Workload Management
When many workers access your study concurrently, your server might become overloaded. Potato automatically manages this:
- Checks active worker count against
max_concurrent_sessions - Pauses the Prolific study if threshold is exceeded
- Monitors worker count every
workload_checker_periodseconds - Resumes when active workers drop to 20% of the maximum
# In prolific_config.yaml
max_concurrent_sessions: 30 # Pause study above this threshold
workload_checker_period: 300 # Check every 5 minutes
Note: The Prolific API may have performance issues with studies over 200 participants. For very large studies, consider running multiple smaller batches.
Alternative: Using Surveyflow for Completion
If you need more control over the completion page, you can use surveyflow instead of the built-in completion page:
- Create an end page (
surveyflow/end.jsonl):
{"id":"1","text":"Thanks for your participation! Click the link below to complete the study.","schema": "pure_display", "choices": ["<a href=\"https://app.prolific.com/submissions/complete?cc=YOUR-CODE\" target=\"_blank\">Complete Study on Prolific</a>"]}
- Configure surveyflow in your YAML:
surveyflow:
on: true
order:
- pre_annotation
- post_annotation
pre_annotation:
- surveyflow/consent.jsonl
post_annotation:
- surveyflow/end.jsonl
Amazon Mechanical Turk Integration
Amazon Mechanical Turk (MTurk) is a popular crowdsourcing marketplace for human intelligence tasks. Potato provides full integration with MTurk through:
- URL-Direct Login: Automatic login using MTurk's URL parameters
- Preview Mode: Proper handling of HIT preview state
- Form Submission: Built-in completion flow with form submission to MTurk
- MTurk API Integration: Optional automatic assignment management via AWS API
Quick Start: Minimal MTurk Setup
For a basic MTurk integration, add these configuration options to your YAML file:
# Enable URL-direct login (extracts workerId from URL)
login:
type: url_direct
url_argument: workerId
# Optional: completion code to record
completion_code: "COMPLETED"
# Hide navigation for crowdsourcing
hide_navbar: true
jumping_to_id_disabled: true
With this configuration:
- Workers arriving from MTurk are automatically logged in using their workerId
- Workers previewing the HIT see a preview page prompting them to accept
- When workers complete all tasks, they see a completion page with a "Submit HIT to MTurk" button
URL Parameters
When workers click your HIT on MTurk, they arrive at a URL like:
https://your-server:8080/?workerId=A1B2C3D4E5&assignmentId=xyz123&hitId=abc456&turkSubmitTo=https://www.mturk.com/mturk/externalSubmit
Potato automatically captures these MTurk parameters:
| Parameter | Description |
|---|---|
workerId |
Worker's unique MTurk ID (used as username) |
assignmentId |
Unique ID for this worker-HIT assignment |
hitId |
The HIT identifier |
turkSubmitTo |
URL where the form should be submitted on completion |
HIT Preview Mode
When assignmentId equals ASSIGNMENT_ID_NOT_AVAILABLE, the worker is previewing the HIT (hasn't accepted it yet). Potato displays a preview page explaining that they need to accept the HIT to begin working.
Completion Flow
When workers complete the task, they see a completion page with:
- A "Submit HIT to MTurk" button that POSTs to the turkSubmitTo URL
- The completion code (if configured) displayed for reference
- The assignmentId is automatically included in the form submission
Setting Up Your External Question HIT
To use Potato with MTurk, you need to create an External Question HIT. Use this XML format:
<?xml version="1.0" encoding="UTF-8"?>
<ExternalQuestion xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2006-07-14/ExternalQuestion.xsd">
<ExternalURL>https://your-server:8080/?workerId=${workerId}&assignmentId=${assignmentId}&hitId=${hitId}&turkSubmitTo=${turkSubmitTo}</ExternalURL>
<FrameHeight>800</FrameHeight>
</ExternalQuestion>
MTurk automatically substitutes the placeholder variables (${workerId}, etc.) with actual values.
Full MTurk Configuration Example
Here's a complete configuration for an MTurk study:
# Task identification
annotation_task_name: "Text Classification Task"
task_description: "Classify the sentiment of short texts."
# MTurk login settings
login:
type: url_direct
url_argument: workerId
# Optional completion code
completion_code: "TASK_COMPLETE"
# Hide navigation to prevent workers from skipping
hide_navbar: true
jumping_to_id_disabled: true
# Automatic task assignment for crowdsourcing
assignment_strategy: random
max_annotations_per_user: 10
max_annotations_per_item: 3
# Data files
data_files:
- data/items.json
# Output
output_annotation_dir: annotation_output/
Sandbox vs Production
MTurk provides a sandbox environment for testing. The endpoints are:
| Environment | Worker Site | Submit URL |
|---|---|---|
| Sandbox | workersandbox.mturk.com | https://workersandbox.mturk.com/mturk/externalSubmit |
| Production | www.mturk.com | https://www.mturk.com/mturk/externalSubmit |
For testing:
1. Create HITs on the sandbox requester site
2. Access as a worker on the sandbox worker site
3. The turkSubmitTo parameter will automatically point to the sandbox
Advanced: MTurk API Integration
For larger studies, Potato can integrate with the MTurk API via boto3 to automatically manage your HIT:
- List and monitor assignments
- Auto-approve completed assignments
- Track worker status
Setup MTurk API
- Install boto3:
pip install boto3
-
Get AWS credentials with MTurk permissions from your AWS account
-
Create an MTurk config file (
configs/mturk_config.yaml):
aws_access_key_id: "YOUR_AWS_ACCESS_KEY"
aws_secret_access_key: "YOUR_AWS_SECRET_KEY"
sandbox: true # Set to false for production
hit_id: "YOUR_HIT_ID" # Optional: for tracking/auto-approval
- Reference it in your main config:
login:
type: url_direct
url_argument: workerId
mturk:
enabled: true
config_file_path: configs/mturk_config.yaml
Security Note: Never commit AWS credentials to version control. Use environment variables or AWS credential files in production.
Step-by-Step MTurk Setup Guide
1. Create Your HIT on MTurk
Go to MTurk Requester (or Sandbox for testing).
Create an External Question HIT with: - Your Potato server URL as the ExternalURL - Appropriate reward and time settings - Required qualifications for quality control
2. Configure Your Potato Project
Add the MTurk settings to your configuration:
login:
type: url_direct
url_argument: workerId
completion_code: "COMPLETED"
hide_navbar: true
jumping_to_id_disabled: true
3. Test in Sandbox
- Create a HIT in the MTurk sandbox
- Run your Potato server
- Access the HIT as a sandbox worker
- Verify the full workflow:
- Preview page shows when HIT not accepted
- Login works when HIT is accepted
- Completion page has working submit button
4. Deploy to Production
Once tested: 1. Deploy your Potato server to a production environment 2. Create production HITs pointing to your server 3. Monitor completions via the admin dashboard
Step-by-Step Prolific Setup Guide
1. Create Your Study on Prolific
Go to Prolific and create a new study. Note your study ID from the URL.
2. Configure Your Potato Project
Add the Prolific settings to your configuration:
login:
type: url_direct
url_argument: PROLIFIC_PID
completion_code: "YOUR-CODE-FROM-PROLIFIC"
hide_navbar: true
jumping_to_id_disabled: true
3. Set Up Task Assignment
Configure automatic assignment for crowdsourcing:
assignment_strategy: random # or "ordered"
max_annotations_per_user: 10
max_annotations_per_item: 3
4. Test Locally
Run your project locally and test with a simulated Prolific URL:
potato start your-project -p 8000
Then visit: http://localhost:8000/?PROLIFIC_PID=test_user
5. Deploy to Server
Upload your project to a server with open ports:
# On your server
git clone your-repo
cd your-project
potato start . -p 8080
6. Configure Prolific Study URL
In Prolific, set your study URL to:
https://your-server-ip:8080/?PROLIFIC_PID={{%PROLIFIC_PID%}}&SESSION_ID={{%SESSION_ID%}}&STUDY_ID={{%STUDY_ID%}}
Prolific will automatically replace the placeholders with actual values.
7. Preview and Launch
Use Prolific's preview feature to test the full worker experience, then launch your study!
Troubleshooting
Workers see "Missing required URL parameter" error
- Ensure your Prolific study URL includes the
PROLIFIC_PIDparameter - Check that
login.typeis set tourl_directorprolific
Workers can't log in
- Verify
require_passwordis not set totrue(it's auto-disabled for URL-direct login) - Check server logs for authentication errors
Completion code not showing
- Ensure
completion_codeis set in your config - Verify workers are reaching the DONE phase
Prolific API not working
- Verify your API token is correct
- Check that
study_idmatches your actual Prolific study - Look for error messages in the server logs
MTurk workers stuck on preview page
- Workers see the preview page when they haven't accepted the HIT yet
- The page auto-refreshes every 3 seconds to check if HIT was accepted
- Ensure your External Question URL includes all required parameters
MTurk submit button not working
- Verify
turkSubmitToparameter is present in the original URL - Check that the form action URL is correct (sandbox vs production)
- Look for CORS or mixed-content errors in browser console
MTurk API not connecting
- Verify AWS credentials are correct
- Check that
sandboxsetting matches your environment - Ensure boto3 is installed:
pip install boto3 - Verify your AWS account has MTurk requester permissions
Example Projects
Check out these example projects in the potato-showcase repository:
- prolific_api_example: Full Prolific API integration example
- match_finding: Simple URL-direct login setup
- summarization_evaluation: Crowdsourcing with automatic assignment
Each example includes configuration files you can adapt for your own studies.