Tiered Annotation (ELAN-style Hierarchical Annotation)

Tiered annotation provides a hierarchical multi-tier annotation interface for audio and video content. This schema is inspired by ELAN, a widely-used tool for linguistic annotation, and supports parent-child relationships between annotation tiers.

Overview

Use tiered annotation when you need to:

Create hierarchical annotations (e.g., utterance → word → phoneme)
Annotate at multiple levels of granularity simultaneously
Maintain relationships between annotations at different levels
Export to ELAN (EAF) or Praat (TextGrid) formats

Quick Start

annotation_schemes:
  - annotation_type: tiered_annotation
    name: linguistic_tiers
    description: "Multi-tier linguistic annotation"
    source_field: audio_url
    media_type: audio

    tiers:
      # Independent tier - directly time-aligned
      - name: utterance
        tier_type: independent
        labels:
          - name: Speaker_A
            color: "#4ECDC4"
          - name: Speaker_B
            color: "#FF6B6B"

      # Dependent tier - child of utterance
      - name: word
        tier_type: dependent
        parent_tier: utterance
        constraint_type: time_subdivision
        labels:
          - name: Word
            color: "#95E1D3"

Configuration Reference

Required Fields

Field	Type	Description
`annotation_type`	string	Must be `"tiered_annotation"`
`name`	string	Unique schema identifier
`description`	string	Display description
`source_field`	string	Field name containing media URL
`tiers`	list	List of tier definitions

Optional Fields

Field	Type	Default	Description
`media_type`	string	`"audio"`	`"audio"` or `"video"`
`tier_height`	int	`50`	Height of each tier row in pixels
`show_tier_labels`	bool	`true`	Show tier name labels
`collapsed_tiers`	list	`[]`	Tier names to start collapsed
`zoom_enabled`	bool	`true`	Enable zoom controls
`playback_rate_control`	bool	`true`	Show playback speed controls
`overview_height`	int	`40`	Height of waveform overview

Tier Definition

Each tier in the tiers list can have:

Field	Type	Required	Description
`name`	string	Yes	Unique tier identifier
`tier_type`	string	No	`"independent"` (default) or `"dependent"`
`parent_tier`	string	For dependent	Name of parent tier
`constraint_type`	string	No	Constraint relationship (see below)
`description`	string	No	Tier description/tooltip
`labels`	list	No	Available annotation labels
`linguistic_type`	string	No	ELAN linguistic type (for export)

Label Definition

Each label can have:

Field	Type	Description
`name`	string	Label text (required)
`color`	string	Hex color code (e.g., `"#4ECDC4"`)
`tooltip`	string	Hover tooltip text
`description`	string	Extended description

Constraint Types

Constraint types define how child annotations relate to their parent:

`time_subdivision`

Children must partition the parent's time span with no gaps. Each child annotation starts where the previous one ends.

Use case: Word → phoneme segmentation where phonemes are contiguous

- name: phoneme
  tier_type: dependent
  parent_tier: word
  constraint_type: time_subdivision

`included_in`

Children must be within parent bounds but may have gaps between them.

Use case: Utterance → word where pauses between words are allowed

- name: word
  tier_type: dependent
  parent_tier: utterance
  constraint_type: included_in

`symbolic_association`

Children are linked to parent without independent time alignment. The child shares the parent's time span.

Use case: Glosses or translations that apply to the whole parent segment

- name: translation
  tier_type: dependent
  parent_tier: utterance
  constraint_type: symbolic_association

`symbolic_subdivision`

Children subdivide the parent symbolically (ordered but without explicit times).

Use case: Morpheme analysis where exact boundaries aren't needed

- name: morpheme
  tier_type: dependent
  parent_tier: word
  constraint_type: symbolic_subdivision

Complete Example

annotation_schemes:
  - annotation_type: tiered_annotation
    name: discourse_annotation
    description: "Multi-level discourse annotation"
    source_field: audio_url
    media_type: audio

    tiers:
      # Level 1: Turn-taking
      - name: turn
        tier_type: independent
        description: "Speaker turns"
        labels:
          - name: Speaker_1
            color: "#4ECDC4"
            tooltip: "Primary speaker"
          - name: Speaker_2
            color: "#FF6B6B"
            tooltip: "Secondary speaker"
          - name: Overlap
            color: "#F39C12"
            tooltip: "Overlapping speech"

      # Level 2: Utterances within turns
      - name: utterance
        tier_type: dependent
        parent_tier: turn
        constraint_type: time_subdivision
        description: "Individual utterances"
        labels:
          - name: Statement
            color: "#95E1D3"
          - name: Question
            color: "#DDA0DD"
          - name: Backchannel
            color: "#87CEEB"

      # Level 3: Words within utterances
      - name: word
        tier_type: dependent
        parent_tier: utterance
        constraint_type: included_in
        description: "Word transcription"
        labels:
          - name: Word
            color: "#FFEAA7"

      # Independent tier for non-verbal
      - name: gesture
        tier_type: independent
        description: "Non-verbal gestures"
        labels:
          - name: Nod
            color: "#98D8C8"
          - name: Point
            color: "#C0C0C0"

    tier_height: 45
    show_tier_labels: true
    zoom_enabled: true
    playback_rate_control: true

User Interface

Timeline View

The annotation interface displays:

Media Player: Audio or video player at the top
Tier Toolbar: Active tier selector, label buttons, playback controls
Timeline: Multi-row display with one row per tier
Annotation List: Expandable list of all annotations

Creating Annotations

Select the target tier from the dropdown
Select a label from the label buttons
Click and drag on the tier's timeline to create the annotation

For dependent tiers: - You must first create a parent annotation - Child annotations must satisfy the constraint type - The system will validate and show errors for invalid placements

Keyboard Shortcuts

Key	Action
`Space`	Play/Pause
`,`	Step backward 0.1 seconds
`.`	Step forward 0.1 seconds
`Delete`/`Backspace`	Delete selected annotation
`Escape`	Deselect annotation

Selecting and Editing

Click an annotation to select it
The media will seek to the annotation's start time
Use the annotation list panel to view all annotations
Delete with keyboard or the delete button in the list

Export Formats

ELAN (EAF)

Export to ELAN Annotation Format for use in ELAN:

python -m potato.export --format eaf --input annotations.json --output ./eaf_output/

The EAF export: - Preserves tier hierarchy and constraint types - Generates valid EAF 3.0 XML - Includes media file references - Creates ELAN linguistic types

Praat (TextGrid)

Export to Praat TextGrid format:

python -m potato.export --format textgrid --input annotations.json --output ./textgrid_output/

The TextGrid export: - Creates interval tiers for each annotation tier - Fills gaps with empty intervals - Supports both long and short TextGrid formats

Note: TextGrid doesn't support hierarchical relationships, so the export flattens the structure while preserving all annotations.

Instance Display Configuration

To display the media player, configure the instance display:

instance_display:
  - field_name: audio_url
    display_type: audio
    label: "Audio"

For video:

instance_display:
  - field_name: video_url
    display_type: video
    label: "Video"

Data Format

Input Data

[
  {
    "id": "sample_001",
    "text": "Sample description",
    "audio_url": "https://example.com/audio.wav"
  }
]

Output Format

Annotations are stored as JSON with tier-organized structure:

{
  "annotations": {
    "utterance": [
      {
        "id": "ann_1234_a1",
        "tier": "utterance",
        "start_time": 1500,
        "end_time": 3200,
        "label": "Speaker_A",
        "color": "#4ECDC4"
      }
    ],
    "word": [
      {
        "id": "ann_1234_a2",
        "tier": "word",
        "start_time": 1500,
        "end_time": 2000,
        "label": "Content",
        "parent_id": "ann_1234_a1"
      }
    ]
  },
  "time_slots": {
    "ts1": 1500,
    "ts2": 2000,
    "ts3": 3200
  }
}

Time values are in milliseconds.

Troubleshooting

"No parent annotation covers this time range"

This error occurs when creating a dependent tier annotation outside any parent annotation. Solution: 1. Create a parent annotation first 2. Ensure the child annotation is within the parent's time bounds

Waveform not displaying

The waveform visualization requires: 1. Peaks.js library (included automatically) 2. Web Audio API support in the browser 3. CORS-enabled media files for cross-origin resources

If the waveform doesn't load, the timeline will still work for annotation.

Export issues

For EAF export, ensure: - All tier names are valid identifiers (no special characters) - Parent-child relationships are consistent

For TextGrid export: - Overlapping annotations on the same tier will be merged or may cause issues - Very small gaps between annotations may be filled

Audio Annotation - Single-tier audio annotation
Video Annotation - Video annotation with segments
Export Formats - Available export formats