Solo Mode
Solo Mode enables a single annotator to efficiently label large datasets with LLM assistance through collaborative annotation.
Overview
Solo Mode provides a streamlined workflow where a human annotator works alongside an LLM to annotate data. The system learns from human feedback, progressively improving its predictions until the human can step back and let the LLM complete the remaining annotations autonomously.
Key Features
- Prompt Synthesis: Automatically generate annotation guidelines from task descriptions
- Edge Case Testing: Generate and label difficult examples to refine prompts
- Parallel Annotation: Human and LLM annotate simultaneously
- Disagreement Resolution: Resolve conflicts between human and LLM labels
- Uncertainty-Based Selection: Prioritize instances where LLM is uncertain
- Progressive Autonomy: Transition to autonomous LLM labeling as agreement improves
- Final Validation: Validate a sample of LLM-only labels
- Edge Case Rule Discovery: Co-DETECT-inspired automatic rule extraction from low-confidence predictions
- Labeling Functions: ALCHEmist-style pattern extraction for zero-cost labeling via majority voting
- Confidence Routing: Cascaded model escalation from cheap to expensive models
- Refinement Loop: Automated confusion analysis → guideline suggestion → prompt revision cycles
- Confusion Analysis: Enriched confusion patterns with root cause analysis
- Prompt Optimizer: DSPy-style automatic prompt improvement from labeled examples
Configuration
Enable Solo Mode in your project's config.yaml:
solo_mode:
enabled: true
# Models for labeling (tried in order)
labeling_models:
- endpoint_type: "anthropic"
model: "claude-3-5-sonnet-20241022"
api_key: "${ANTHROPIC_API_KEY}"
- endpoint_type: "openai"
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}"
# Models for prompt revision
revision_models:
- endpoint_type: "anthropic"
model: "claude-3-5-sonnet-20241022"
# Uncertainty estimation strategy
uncertainty:
strategy: "direct_confidence" # Options: direct_confidence, direct_uncertainty, token_entropy, sampling_diversity
num_samples: 5 # For sampling_diversity
sampling_temperature: 1.0 # For sampling_diversity
# Thresholds
thresholds:
end_human_annotation_agreement: 0.90 # Required agreement rate to stop human annotation
minimum_validation_sample: 50 # Minimum comparisons before ending
confidence_low: 0.5 # Low confidence threshold
confidence_high: 0.8 # High confidence threshold
periodic_review_interval: 100 # Review LLM labels every N instances
# Instance selection weights (must sum to 1.0)
instance_selection:
low_confidence_weight: 0.4 # Prioritize uncertain instances
diversity_weight: 0.3 # Prioritize diverse instances
random_weight: 0.2 # Random sample for calibration
disagreement_weight: 0.1 # Prioritize prior disagreements
# Batch sizes
batches:
llm_labeling_batch: 50 # Instances to label per batch
max_parallel_labels: 200 # Max LLM labels ahead of human
# Prompt optimization (optional)
prompt_optimization:
enabled: true
find_smallest_model: true
target_accuracy: 0.85
Workflow Phases
Solo Mode progresses through the following phases:
1. Setup
- Enter task description
- Upload data file
- Generate initial prompt
2. Prompt Review
- Review and edit the generated prompt
- Add clarifying examples
- Refine edge case handling
3. Edge Case Synthesis
- LLM generates difficult examples
- Examples test boundary conditions
- Helps identify prompt weaknesses
4. Edge Case Labeling
- Label the synthesized edge cases
- Labels used to improve prompt
- Validates prompt clarity
5. Prompt Validation
- LLM relabels edge cases with improved prompt
- Verify prompt improvements work
- Iterate if necessary
6. Parallel Annotation
- Human and LLM annotate simultaneously
- LLM labels instances in background
- Human labels prioritized instances
7. Disagreement Resolution
- Review instances where human and LLM disagree
- Decide on correct label
- Improve understanding of edge cases
8. Periodic Review
- Periodically review low-confidence LLM labels
- Approve or correct predictions
- Maintain quality during autonomous phase
9. Autonomous Labeling
- Agreement threshold reached
- LLM completes remaining instances
- Human monitors progress
10. Final Validation
- Validate sample of LLM-only labels
- Confirm quality meets standards
- Export final dataset
Uncertainty Estimation
Solo Mode uses uncertainty estimation to prioritize which instances the human should label. Four strategies are available:
Direct Confidence
Asks the LLM to rate its confidence (0-100). Simple and works with all models.
uncertainty:
strategy: "direct_confidence"
Direct Uncertainty
Asks the LLM to rate its uncertainty directly. Alternative framing that may work better for some models.
uncertainty:
strategy: "direct_uncertainty"
Token Entropy
Uses entropy of answer token probabilities. More objective but requires logprobs support (OpenAI, vLLM).
uncertainty:
strategy: "token_entropy"
Sampling Diversity
Runs the LLM multiple times at high temperature and measures label diversity. Most accurate but expensive.
uncertainty:
strategy: "sampling_diversity"
num_samples: 5
sampling_temperature: 1.0
Instance Selection
Instances are selected for human annotation using a weighted mixture:
| Pool | Weight | Description |
|---|---|---|
| Low Confidence | 40% | Instances where LLM is uncertain |
| Diverse | 30% | Instances from different embedding clusters |
| Random | 20% | Random sample for calibration |
| Disagreement | 10% | Instances with prior human-LLM disagreement |
Adjust weights in config:
instance_selection:
low_confidence_weight: 0.4
diversity_weight: 0.3
random_weight: 0.2
disagreement_weight: 0.1
API Endpoints
Solo Mode provides API endpoints for monitoring and control:
Status
GET /solo/api/status
Returns current phase, annotation stats, agreement metrics.
Prompts
GET /solo/api/prompts
Returns prompt version history.
Predictions
GET /solo/api/predictions
Returns all LLM predictions.
Control
POST /solo/api/advance-phase
POST /solo/api/pause-labeling
POST /solo/api/resume-labeling
POST /solo/api/optimize-prompt
Export
GET /solo/api/export
Exports all annotations and predictions.
Best Practices
Writing Good Task Descriptions
- Be specific about what you're labeling
- Define each label clearly
- Explain what makes something ambiguous
- Include examples of edge cases
Prompt Refinement
- Start with the generated prompt
- Add examples for difficult cases
- Clarify ambiguous criteria
- Keep prompts concise but complete
Monitoring Progress
- Check agreement rate regularly
- Review confusion patterns
- Adjust prompts when accuracy drops
- Validate LLM-only labels periodically
Advanced Features
Solo Mode includes several advanced subsystems for automated quality improvement, cost optimization, and deeper analysis. These features work together to progressively refine annotation quality with minimal human effort.
Key advanced capabilities include: - Edge case rule discovery and labeling functions for reducing LLM API costs - Confidence routing for cascading instances through cheap-to-expensive model tiers - Automated refinement loops that analyze confusion patterns and revise prompts - Schema-specific agreement thresholds for Likert, multiselect, text, and span annotations
For full configuration details, see Solo Mode Advanced Features.
Troubleshooting
Low Agreement Rate
- Check if labels are clearly defined
- Review confusion patterns to find systematic errors
- Add examples for problematic cases
- Consider splitting ambiguous categories
LLM Not Labeling
- Verify API credentials are correct
- Check model availability
- Review logs for errors
- Try fallback models
Slow Performance
- Reduce batch sizes
- Use faster models for labeling
- Limit parallel labels
Example Project
See examples/advanced/solo-mode/ for a complete working example.
Developer Documentation
For extending Solo Mode or understanding the implementation, see the Solo Mode Developer Guide.