Solo Mode

Solo Mode enables a single annotator to efficiently label large datasets with LLM assistance through collaborative annotation.

Overview

Solo Mode provides a streamlined workflow where a human annotator works alongside an LLM to annotate data. The system learns from human feedback, progressively improving its predictions until the human can step back and let the LLM complete the remaining annotations autonomously.

Key Features

Prompt Synthesis: Automatically generate annotation guidelines from task descriptions
Edge Case Testing: Generate and label difficult examples to refine prompts
Parallel Annotation: Human and LLM annotate simultaneously
Disagreement Resolution: Resolve conflicts between human and LLM labels
Uncertainty-Based Selection: Prioritize instances where LLM is uncertain
Progressive Autonomy: Transition to autonomous LLM labeling as agreement improves
Final Validation: Validate a sample of LLM-only labels
Edge Case Rule Discovery: Co-DETECT-inspired automatic rule extraction from low-confidence predictions
Labeling Functions: ALCHEmist-style pattern extraction for zero-cost labeling via majority voting
Confidence Routing: Cascaded model escalation from cheap to expensive models
Refinement Loop: Automated confusion analysis → guideline suggestion → prompt revision cycles
Confusion Analysis: Enriched confusion patterns with root cause analysis
Prompt Optimizer: DSPy-style automatic prompt improvement from labeled examples

Configuration

Enable Solo Mode in your project's config.yaml:

solo_mode:
  enabled: true

  # Models for labeling (tried in order)
  labeling_models:
    - endpoint_type: "anthropic"
      model: "claude-3-5-sonnet-20241022"
      api_key: "${ANTHROPIC_API_KEY}"
    - endpoint_type: "openai"
      model: "gpt-4o-mini"
      api_key: "${OPENAI_API_KEY}"

  # Models for prompt revision
  revision_models:
    - endpoint_type: "anthropic"
      model: "claude-3-5-sonnet-20241022"

  # Uncertainty estimation strategy
  uncertainty:
    strategy: "direct_confidence"  # Options: direct_confidence, direct_uncertainty, token_entropy, sampling_diversity
    num_samples: 5                 # For sampling_diversity
    sampling_temperature: 1.0      # For sampling_diversity

  # Thresholds
  thresholds:
    end_human_annotation_agreement: 0.90    # Required agreement rate to stop human annotation
    minimum_validation_sample: 50           # Minimum comparisons before ending
    confidence_low: 0.5                     # Low confidence threshold
    confidence_high: 0.8                    # High confidence threshold
    periodic_review_interval: 100           # Review LLM labels every N instances

  # Instance selection weights (must sum to 1.0)
  instance_selection:
    low_confidence_weight: 0.4    # Prioritize uncertain instances
    diversity_weight: 0.3         # Prioritize diverse instances
    random_weight: 0.2            # Random sample for calibration
    disagreement_weight: 0.1      # Prioritize prior disagreements

  # Batch sizes
  batches:
    llm_labeling_batch: 50        # Instances to label per batch
    max_parallel_labels: 200       # Max LLM labels ahead of human

  # Prompt optimization (optional)
  prompt_optimization:
    enabled: true
    find_smallest_model: true
    target_accuracy: 0.85

Workflow Phases

Solo Mode progresses through the following phases:

1. Setup

Enter task description
Upload data file
Generate initial prompt

2. Prompt Review

Review and edit the generated prompt
Add clarifying examples
Refine edge case handling

3. Edge Case Synthesis

LLM generates difficult examples
Examples test boundary conditions
Helps identify prompt weaknesses

4. Edge Case Labeling

Label the synthesized edge cases
Labels used to improve prompt
Validates prompt clarity

5. Prompt Validation

LLM relabels edge cases with improved prompt
Verify prompt improvements work
Iterate if necessary

6. Parallel Annotation

Human and LLM annotate simultaneously
LLM labels instances in background
Human labels prioritized instances

7. Disagreement Resolution

Review instances where human and LLM disagree
Decide on correct label
Improve understanding of edge cases

8. Periodic Review

Periodically review low-confidence LLM labels
Approve or correct predictions
Maintain quality during autonomous phase

9. Autonomous Labeling

Agreement threshold reached
LLM completes remaining instances
Human monitors progress

10. Final Validation

Validate sample of LLM-only labels
Confirm quality meets standards
Export final dataset

Uncertainty Estimation

Solo Mode uses uncertainty estimation to prioritize which instances the human should label. Four strategies are available:

Direct Confidence

Asks the LLM to rate its confidence (0-100). Simple and works with all models.

uncertainty:
  strategy: "direct_confidence"

Direct Uncertainty

Asks the LLM to rate its uncertainty directly. Alternative framing that may work better for some models.

uncertainty:
  strategy: "direct_uncertainty"

Token Entropy

Uses entropy of answer token probabilities. More objective but requires logprobs support (OpenAI, vLLM).

uncertainty:
  strategy: "token_entropy"

Sampling Diversity

Runs the LLM multiple times at high temperature and measures label diversity. Most accurate but expensive.

uncertainty:
  strategy: "sampling_diversity"
  num_samples: 5
  sampling_temperature: 1.0

Instance Selection

Instances are selected for human annotation using a weighted mixture:

Pool	Weight	Description
Low Confidence	40%	Instances where LLM is uncertain
Diverse	30%	Instances from different embedding clusters
Random	20%	Random sample for calibration
Disagreement	10%	Instances with prior human-LLM disagreement

Adjust weights in config:

instance_selection:
  low_confidence_weight: 0.4
  diversity_weight: 0.3
  random_weight: 0.2
  disagreement_weight: 0.1

API Endpoints

Solo Mode provides API endpoints for monitoring and control:

Status

GET /solo/api/status

Returns current phase, annotation stats, agreement metrics.

Prompts

GET /solo/api/prompts

Returns prompt version history.

Predictions

GET /solo/api/predictions

Returns all LLM predictions.

Control

POST /solo/api/advance-phase
POST /solo/api/pause-labeling
POST /solo/api/resume-labeling
POST /solo/api/optimize-prompt

Export

GET /solo/api/export

Exports all annotations and predictions.

Best Practices

Writing Good Task Descriptions

Be specific about what you're labeling
Define each label clearly
Explain what makes something ambiguous
Include examples of edge cases

Start with the generated prompt
Add examples for difficult cases
Clarify ambiguous criteria
Keep prompts concise but complete

Monitoring Progress

Check agreement rate regularly
Review confusion patterns
Adjust prompts when accuracy drops
Validate LLM-only labels periodically

Advanced Features

Solo Mode includes several advanced subsystems for automated quality improvement, cost optimization, and deeper analysis. These features work together to progressively refine annotation quality with minimal human effort.

Key advanced capabilities include: - Edge case rule discovery and labeling functions for reducing LLM API costs - Confidence routing for cascading instances through cheap-to-expensive model tiers - Automated refinement loops that analyze confusion patterns and revise prompts - Schema-specific agreement thresholds for Likert, multiselect, text, and span annotations

For full configuration details, see Solo Mode Advanced Features.

Troubleshooting

Low Agreement Rate

Check if labels are clearly defined
Review confusion patterns to find systematic errors
Add examples for problematic cases
Consider splitting ambiguous categories

LLM Not Labeling

Verify API credentials are correct
Check model availability
Review logs for errors
Try fallback models

Slow Performance

Reduce batch sizes
Use faster models for labeling
Limit parallel labels

Example Project

See examples/advanced/solo-mode/ for a complete working example.

Developer Documentation

For extending Solo Mode or understanding the implementation, see the Solo Mode Developer Guide.