Pairwise Comparison Annotation

The pairwise annotation schema allows annotators to compare two items side by side and indicate their preference. It supports two modes:

Binary Mode: Click on the preferred tile (A or B), with optional tie button
Scale Mode: Use a slider to rate how much one option is preferred over the other

Use Cases

Comparing model outputs (which response is better?)
Preference learning for RLHF training data
Quality comparison of translations, summaries, etc.
A/B testing analysis
Sentiment or intimacy comparisons

Configuration

Binary Mode (Default)

Binary mode displays two clickable tiles. Annotators click on their preferred option.

annotation_schemes:
  - annotation_type: pairwise
    name: preference
    description: "Which response is better?"
    mode: binary  # Optional, default is "binary"

    # Data source - key in instance data containing items to compare
    items_key: "responses"  # Expects a list with 2+ items

    # Display options
    show_labels: true         # Show A/B labels (default: true)
    labels:                   # Custom labels (default: ["A", "B"])
      - "Response A"
      - "Response B"

    # Tie option (opt-in)
    allow_tie: true           # Show "No preference" button
    tie_label: "No preference"  # Custom tie button text

    # Keyboard shortcuts
    sequential_key_binding: true  # Enable 1/2/0 shortcuts (default: true)

    # Validation
    label_requirement:
      required: true  # Require selection before proceeding

Scale Mode

Scale mode displays a slider between two items, allowing annotators to indicate the degree of preference.

annotation_schemes:
  - annotation_type: pairwise
    name: preference_scale
    description: "Rate how much better A is than B"
    mode: scale

    items_key: "responses"

    # Display labels for the two items
    labels:
      - "Response A"
      - "Response B"

    # Scale configuration
    scale:
      min: -3           # Negative = prefer left item (A)
      max: 3            # Positive = prefer right item (B)
      step: 1           # Slider step increment
      default: 0        # Initial slider position

      # Endpoint labels
      labels:
        min: "A is much better"
        max: "B is much better"
        center: "Equal"

    label_requirement:
      required: true

Data Format

The schema expects instance data with a list of items to compare:

{"id": "1", "responses": ["Response A text", "Response B text"]}
{"id": "2", "responses": ["First option here", "Second option here"]}

The items_key configuration specifies which field contains the items to compare. The field should contain a list with at least 2 items.

Output Format

Binary Mode Output

When annotator selects option A:

{
  "preference": {
    "selection": "A"
  }
}

When annotator selects tie:

{
  "preference": {
    "selection": "tie"
  }
}

Scale Mode Output

The scale value indicates degree of preference: - Negative values: Left item (A) is preferred - Zero: No preference / Equal - Positive values: Right item (B) is preferred

{
  "preference_scale": {
    "scale_value": "-2"
  }
}

Keyboard Shortcuts

In binary mode with sequential_key_binding: true: - 1: Select option A - 2: Select option B - 0: Select tie/no preference (if allow_tie: true)

Scale mode does not have keyboard shortcuts (uses slider interaction).

Styling

The pairwise annotation uses CSS variables from the theme system: - --primary: Selected tile border and accent color - --border: Default tile border color - --card: Tile background color - --muted: Tie button background

Custom Styling

Add custom CSS to your site_dir/static/custom.css:

/* Make tiles taller */
.pairwise-tile {
  min-height: 200px;
}

/* Change selected tile highlight */
.pairwise-tile.selected {
  border-color: #10b981;
  background-color: rgba(16, 185, 129, 0.1);
}

Examples

Basic Binary Comparison

annotation_schemes:
  - annotation_type: pairwise
    name: quality
    description: "Which text is higher quality?"
    labels: ["Text A", "Text B"]
    allow_tie: true

Preference Scale with Custom Range

annotation_schemes:
  - annotation_type: pairwise
    name: sentiment_comparison
    description: "Compare the sentiment of these two statements"
    mode: scale
    labels: ["Statement A", "Statement B"]
    scale:
      min: -5
      max: 5
      step: 1
      labels:
        min: "A is much more positive"
        max: "B is much more positive"
        center: "Equal sentiment"

Multiple Pairwise Comparisons

You can include multiple pairwise schemas to compare on different dimensions:

annotation_schemes:
  - annotation_type: pairwise
    name: fluency
    description: "Which response is more fluent?"
    labels: ["Response A", "Response B"]

  - annotation_type: pairwise
    name: relevance
    description: "Which response is more relevant?"
    labels: ["Response A", "Response B"]

  - annotation_type: pairwise
    name: overall
    description: "Which response is better overall?"
    labels: ["Response A", "Response B"]
    allow_tie: true

Running the Example

# Binary mode example
python potato/flask_server.py start examples/classification/pairwise-comparison/config.yaml -p 8000

# Scale mode example
python potato/flask_server.py start examples/classification/pairwise-scale/config.yaml -p 8002

Then navigate to http://localhost:8000 and register/login to start annotating.

Annotation Schemas - Overview of all annotation types
List as Text Display - Display lists with prefixes
Display Types - Different ways to display content