Training Phase Documentation
The training phase is an optional component of the annotation workflow that allows administrators to provide users with practice questions and feedback before they begin the main annotation task. This helps ensure annotation quality and user understanding of the task requirements.
Overview
The training phase provides: - Practice Questions: Users answer questions with known correct answers - Immediate Feedback: Users receive feedback on their answers with explanations - Retry Functionality: Users can retry incorrect answers until they get them right - Progress Tracking: Administrators can monitor training completion and performance - Quality Assurance: Only users who pass training can proceed to annotation
Configuration
Basic Training Configuration
To enable the training phase, add a training section to your YAML configuration:
training:
enabled: true
data_file: "training_data.json"
annotation_schemes: ["sentiment", "topic"]
passing_criteria:
min_correct: 3
require_all_correct: false
allow_retry: true
failure_action: "repeat_training" # or "move_to_done"
Configuration Options
| Option | Type | Required | Default | Description |
|---|---|---|---|---|
enabled |
boolean | Yes | false | Whether to enable the training phase |
data_file |
string | Yes* | - | Path to training data file (required if enabled) |
annotation_schemes |
list | No | All schemes | Which annotation schemes to use in training |
passing_criteria.min_correct |
integer | No | 3 | Minimum correct answers required to pass |
passing_criteria.require_all_correct |
boolean | No | false | Whether all questions must be correct |
passing_criteria.max_mistakes |
integer | No | -1 | Maximum total mistakes before failure (-1 = unlimited) |
passing_criteria.max_mistakes_per_question |
integer | No | -1 | Maximum mistakes per question before failure (-1 = unlimited) |
allow_retry |
boolean | No | true | Whether to allow retrying incorrect answers |
failure_action |
string | No | "move_to_done" | Action when user fails ("move_to_done" or "repeat_training") |
Training Strategies
Potato supports multiple training strategies that can be combined:
-
Minimum Correct: User must get at least N answers correct to pass
yaml passing_criteria: min_correct: 3 -
Require All Correct: User must answer every question correctly
yaml passing_criteria: require_all_correct: true -
Maximum Mistakes: User is kicked out after N total mistakes
yaml passing_criteria: max_mistakes: 5 # Fail after 5 wrong answers total -
Maximum Mistakes Per Question: User is kicked out after N mistakes on any single question
yaml passing_criteria: max_mistakes_per_question: 2 # Fail after 2 wrong answers on same question -
Allow Retry: Let users retry incorrect answers
yaml allow_retry: true
These can be combined for complex qualification requirements. For example:
passing_criteria:
min_correct: 3 # Need 3 correct
max_mistakes: 5 # But no more than 5 total mistakes
max_mistakes_per_question: 2 # And no more than 2 per question
Phase Integration
Add the training phase to your workflow by including it in the phases order:
phases:
order: ["consent", "instructions", "training", "annotation"]
consent:
type: "consent"
file: "consent.json"
instructions:
type: "instructions"
file: "instructions.json"
training:
type: "training"
file: "training.json"
Training Data Format
Training data is stored in a JSON file with the following structure:
{
"training_instances": [
{
"id": "train_1",
"text": "This is a positive sentiment text.",
"correct_answers": {
"sentiment": "positive",
"topic": "emotion"
},
"explanation": "This text expresses positive emotions and opinions."
},
{
"id": "train_2",
"text": "This is a negative sentiment text.",
"correct_answers": {
"sentiment": "negative",
"topic": "emotion"
},
"explanation": "This text expresses negative emotions and opinions."
}
]
}
Training Instance Fields
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Unique identifier for the training instance |
text |
string | Yes | The text to be annotated |
correct_answers |
object | Yes | Map of schema names to correct values |
explanation |
string | No | Explanation shown when user answers incorrectly |
Correct Answers Format
The correct_answers field should match your annotation schemes:
{
"sentiment": "positive", // Radio button selection
"topic": ["emotion", "personal"], // Multi-select options
"rating": 4, // Numeric rating
"text_field": "example response" // Text input
}
User Experience
Training Interface
Users see a dedicated training interface with: - Clear indication they're in the training phase - The training question text - Annotation forms matching the main task - Submit button to answer the question
Feedback System
After submitting an answer, users receive:
Correct Answers: - Green feedback message: "Correct! Moving to next question." - Automatic progression to the next question - No retry option needed
Incorrect Answers: - Red feedback message with explanation - Option to retry the question (if enabled) - Clear explanation of why their answer was wrong
Progress Tracking
Users can see their progress through: - Current question number - Total questions remaining - Training completion status
Admin Monitoring
Dashboard Statistics
The admin dashboard shows training statistics for each user:
- Training Completed: Whether the user has passed training
- Correct Answers: Number of correct answers given
- Total Attempts: Total number of attempts across all questions
- Pass Rate: Percentage of correct answers
- Current Question: Which question the user is currently on
- Total Questions: Total number of training questions
API Endpoints
Training statistics are available through the admin API:
# Get all annotators with training stats
GET /admin/api/annotators
# Get specific user state including training
GET /admin/user_state/{user_id}
Example API Response
{
"annotators": [
{
"user_id": "user123",
"phase": "TRAINING",
"training_completed": false,
"training_correct_answers": 2,
"training_total_attempts": 3,
"training_pass_rate": 66.67,
"training_current_question": 2,
"training_total_questions": 5
}
]
}
Best Practices
Designing Training Questions
- Start Simple: Begin with straightforward examples
- Cover All Cases: Include examples for each possible answer
- Clear Explanations: Provide helpful explanations for incorrect answers
- Realistic Examples: Use examples similar to the actual annotation task
- Appropriate Difficulty: Set reasonable passing criteria
Configuration Recommendations
- Enable Retries: Allow users to learn from mistakes
- Set Reasonable Criteria: Don't require 100% accuracy unless necessary
- Use Explanations: Help users understand why answers are correct/incorrect
- Monitor Performance: Use admin dashboard to track training effectiveness
Training Data Guidelines
- Consistent Format: Ensure training data matches your annotation schemes
- Clear Examples: Use unambiguous examples with obvious correct answers
- Comprehensive Coverage: Include examples for all possible annotation values
- Helpful Explanations: Provide explanations that help users understand the task
Troubleshooting
Common Issues
Training not appearing:
- Check that training.enabled is set to true
- Verify the training phase is in the phases order
- Ensure the training data file exists and is valid
Training data not loading:
- Check the data_file path is correct
- Verify the JSON format is valid
- Ensure annotation schemes match between config and training data
Users stuck in training: - Check passing criteria are reasonable - Verify training data has correct answers - Monitor admin dashboard for training progress
Feedback not showing:
- Check allow_retry setting
- Verify explanations are provided in training data
- Ensure training template is properly configured
Debugging
Use the admin dashboard to: - Monitor user training progress - Check training statistics - Verify training data loading - Track user phase progression
Example Configurations
Basic Sentiment Training
training:
enabled: true
data_file: "sentiment_training.json"
annotation_schemes: ["sentiment"]
passing_criteria:
min_correct: 2
require_all_correct: false
allow_retry: true
failure_action: "repeat_training"
Advanced Multi-Scheme Training
training:
enabled: true
data_file: "advanced_training.json"
annotation_schemes: ["sentiment", "topic", "confidence"]
passing_criteria:
min_correct: 5
require_all_correct: false
allow_retry: true
failure_action: "repeat_training"
Strict Training (No Retries)
training:
enabled: true
data_file: "strict_training.json"
annotation_schemes: ["sentiment"]
passing_criteria:
min_correct: 3
require_all_correct: true
allow_retry: false
failure_action: "move_to_done"
Integration with Existing Workflows
The training phase integrates seamlessly with existing annotation workflows:
- Phase Progression: Users automatically advance through phases
- State Persistence: Training progress is saved and restored
- Admin Monitoring: Training stats appear in existing admin interfaces
- Template System: Uses existing template and styling systems
- Authentication: Works with existing authentication systems
Performance Considerations
- Training data is loaded once at server startup
- Training state is stored in memory (same as other user state)
- No additional database requirements
- Minimal performance impact on existing functionality
Security
- Training data is validated against annotation schemes
- User training state is isolated per user
- Admin access controls apply to training statistics
- No sensitive data exposure through training interface