Export Formats

Potato supports exporting annotations to multiple industry-standard formats for use with machine learning frameworks, other annotation tools, and data pipelines.

Overview

Potato's annotation pipeline works in two stages:

Live Persistence — During annotation, all data is automatically saved as per-user user_state.json files inside output_annotation_dir
Export — After annotation, use the Export CLI or admin API to convert annotations into analysis-ready formats (JSON, CSV, COCO, YOLO, CoNLL, etc.)

Live Annotation Storage

Configuration

output_annotation_dir: annotation_output/

During annotation, Potato automatically persists all user state to JSON files:

annotation_output/
├── user1/
│   └── user_state.json
├── user2/
│   └── user_state.json
└── ...

Each user_state.json contains the complete annotation state for that user:

{
    "user_id": "annotator_1",
    "instance_id_to_label_to_value": {
        "item_001": {
            "sentiment": {"labels": {"positive": true}}
        }
    },
    "instance_id_to_span_to_value": {
        "item_001": {
            "ner": [
                {"start": 0, "end": 5, "label": "PERSON", "text": "Alice"}
            ]
        }
    }
}

Note: The older output_annotation_format config key is legacy and has no effect. Use export_annotation_format for auto-export (see below).

Auto-Export

You can configure Potato to automatically export annotations in additional formats during annotation. Exports are written to {output_annotation_dir}/exports/{format}/.

# Single format
export_annotation_format: "csv"

# Multiple formats
export_annotation_format:
  - "csv"
  - "jsonl"

# Control how often auto-export runs (default: 60 seconds)
auto_export_interval: 60

Supported auto-export formats include csv, tsv, jsonl, parquet, coco, yolo, conll_2003, and all other registered exporters. Run python -m potato.export --list-formats to see all available formats.

Export CLI

The export CLI converts Potato annotations to specialized formats.

Basic Usage

# List available export formats
python -m potato.export --list-formats

# Export to COCO format
python -m potato.export --config config.yaml --format coco --output ./export/

# Export to YOLO format
python -m potato.export --config config.yaml --format yolo --output ./export/

# Export with options
python -m potato.export --config config.yaml --format coco --output ./export/ \
    --option split_ratio=0.8 --option include_unlabeled=false

Command Options

Option	Description
`--config`, `-c`	Path to Potato YAML config file
`--format`, `-f`	Export format (coco, yolo, pascal_voc, etc.)
`--output`, `-o`	Output directory (default: ./export_output)
`--option`	Format-specific option as key=value (repeatable)
`--list-formats`	List available formats and exit
`--verbose`, `-v`	Enable verbose logging

Supported Export Formats

COCO (coco)

The Common Objects in Context format, widely used for object detection and instance segmentation.

Best for: Image bounding boxes, polygons, keypoints

Output Structure:

export/
├── annotations/
│   └── instances.json
└── images/
    └── (symlinked or copied images)

annotations/instances.json:

{
    "info": {"description": "Potato export", "version": "1.0"},
    "licenses": [],
    "images": [
        {"id": 1, "file_name": "image_001.jpg", "width": 1920, "height": 1080}
    ],
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "category_id": 1,
            "bbox": [100, 50, 200, 300],
            "area": 60000,
            "segmentation": [[100, 50, 300, 50, 300, 350, 100, 350]],
            "iscrowd": 0
        }
    ],
    "categories": [
        {"id": 1, "name": "person", "supercategory": "object"}
    ]
}

Usage:

python -m potato.export -c config.yaml -f coco -o ./coco_export/

YOLO (yolo)

YOLO format for object detection, with one text file per image.

Best for: Object detection training with YOLO models

Output Structure:

export/
├── images/
│   ├── train/
│   │   └── image_001.jpg
│   └── val/
│       └── image_002.jpg
├── labels/
│   ├── train/
│   │   └── image_001.txt
│   └── val/
│       └── image_002.txt
├── data.yaml
└── classes.txt

Label File Format (image_001.txt):

# class_id center_x center_y width height (normalized 0-1)
0 0.5 0.5 0.25 0.35
1 0.3 0.4 0.15 0.20

data.yaml:

train: ./images/train
val: ./images/val
nc: 3
names: ['person', 'vehicle', 'object']

Usage:

python -m potato.export -c config.yaml -f yolo -o ./yolo_export/ \
    --option split_ratio=0.8

Options: - split_ratio: Train/val split ratio (default: 0.8)

Pascal VOC (pascal_voc)

Pascal Visual Object Classes format using XML annotation files.

Best for: Object detection, compatible with many CV frameworks

Output Structure:

export/
├── Annotations/
│   └── image_001.xml
├── ImageSets/
│   └── Main/
│       ├── train.txt
│       └── val.txt
└── JPEGImages/
    └── image_001.jpg

Annotation XML:

<annotation>
    <folder>JPEGImages</folder>
    <filename>image_001.jpg</filename>
    <size>
        <width>1920</width>
        <height>1080</height>
        <depth>3</depth>
    </size>
    <object>
        <name>person</name>
        <bndbox>
            <xmin>100</xmin>
            <ymin>50</ymin>
            <xmax>300</xmax>
            <ymax>350</ymax>
        </bndbox>
    </object>
</annotation>

Usage:

python -m potato.export -c config.yaml -f pascal_voc -o ./voc_export/

CoNLL-2003 (conll_2003)

CoNLL-2003 format for named entity recognition.

Best for: NER/span annotations, sequence labeling

Output Format:

-DOCSTART- -X- O O

Alice B-PERSON
went O O
to O O
Paris B-LOCATION
. O O

Bob B-PERSON
works O O
at O O
Google B-ORGANIZATION
. O O

Usage:

python -m potato.export -c config.yaml -f conll_2003 -o ./conll_export/

Options: - tag_scheme: BIO, BIOES, or IOB (default: BIO)

CoNLL-U (conll_u)

Universal Dependencies CoNLL-U format for linguistic annotation.

Best for: POS tagging, dependency parsing, morphological analysis

Output Format:

# sent_id = 1
# text = Alice went to Paris.
1   Alice   Alice   PROPN   NNP Number=Sing 2   nsubj   _   SpaceAfter=No
2   went    go  VERB    VBD Tense=Past  0   root    _   _
3   to  to  ADP IN  _   4   case    _   _
4   Paris   Paris   PROPN   NNP Number=Sing 2   obl _   SpaceAfter=No
5   .   .   PUNCT   .   _   2   punct   _   _

Usage:

python -m potato.export -c config.yaml -f conll_u -o ./conllu_export/

Segmentation Masks (mask)

Export polygon/segmentation annotations as binary mask images.

Best for: Semantic segmentation, instance segmentation

Output Structure:

export/
├── images/
│   └── image_001.jpg
├── masks/
│   └── image_001.png
└── class_mapping.json

Mask Format: - PNG images with pixel values corresponding to class IDs - 0 = background, 1+ = class indices

Usage:

python -m potato.export -c config.yaml -f mask -o ./mask_export/

Parquet (parquet)

Columnar format for efficient analytics. Produces structured tables for annotations, spans, and source items.

Best for: Large-scale analysis with pandas, DuckDB, Spark, or any Arrow-compatible tool

Requires: pyarrow >= 12.0.0 (pip install pyarrow)

Output Structure:

export/
├── annotations.parquet    # One row per (instance_id, user_id) pair
├── spans.parquet          # One row per span annotation (if spans exist)
└── items.parquet          # One row per original data item (optional)

annotations.parquet schema:

Column	Type	Description
`instance_id`	string	The annotated item's ID
`user_id`	string	The annotator's ID
\<schema_name>	varies	One column per annotation schema, type depends on schema

Schema columns are flattened by annotation type: - radio/select → string (the selected label) - likert/slider/number → float64 - multiselect → list<string> (selected labels) - text → string

spans.parquet schema:

Column	Type	Description
`instance_id`	string	The annotated item's ID
`user_id`	string	The annotator's ID
`schema_name`	string	Name of the span annotation schema
`start`	int	Character offset where the span begins
`end`	int	Character offset where the span ends
`label`	string	The span's label
`text`	string	The text content of the span

items.parquet schema:

Column	Type	Description
`item_id`	string	The item's ID
\<field_name>	varies	One column per field in the original data (nested dicts/lists are JSON-serialized)

Usage:

python -m potato.export -c config.yaml -f parquet -o ./parquet_export/

Options:

Option	Default	Description
`compression`	`snappy`	Compression codec: `snappy`, `gzip`, `zstd`, `lz4`, or `none`
`include_items`	`true`	Generate `items.parquet` with source data
`include_spans`	`true`	Generate `spans.parquet` (if span annotations exist)
`row_group_size`	PyArrow default	Row group size for `annotations.parquet`

# Export with gzip compression, skip items table
python -m potato.export -c config.yaml -f parquet -o ./parquet_export/ \
    --option compression=gzip --option include_items=false

Reading with pandas:

import pandas as pd

annotations = pd.read_parquet("export/annotations.parquet")
spans = pd.read_parquet("export/spans.parquet")
items = pd.read_parquet("export/items.parquet")

# Filter to a specific annotator
user_anns = annotations[annotations["user_id"] == "annotator_1"]

Reading with DuckDB:

-- Direct query without loading into memory
SELECT instance_id, sentiment, COUNT(*) as n
FROM 'export/annotations.parquet'
GROUP BY instance_id, sentiment;

-- Join annotations with source items
SELECT a.instance_id, a.sentiment, i.text
FROM 'export/annotations.parquet' a
JOIN 'export/items.parquet' i ON a.instance_id = i.item_id;

CSV (csv)

Export annotations as comma-separated values with one row per annotation.

python -m potato.export --config config.yaml --format csv --output ./export/

TSV (tsv)

Export annotations as tab-separated values. Same structure as CSV but with tab delimiters.

python -m potato.export --config config.yaml --format tsv --output ./export/

JSONL (jsonl)

Export annotations as JSON Lines (one JSON object per line). Preserves full annotation structure.

python -m potato.export --config config.yaml --format jsonl --output ./export/

EAF - ELAN Annotation Format (eaf)

Export tiered annotations as ELAN EAF XML files for use with ELAN, a tool for linguistic and phonetic annotation of audio/video.

python -m potato.export --config config.yaml --format eaf --output ./export/

TextGrid - Praat (textgrid)

Export tiered annotations as Praat TextGrid files for use with Praat, a tool for phonetic analysis.

python -m potato.export --config config.yaml --format textgrid --output ./export/

Agent Evaluation (agent_eval)

Export agent trace evaluation results with aggregated scores, step-level ratings, and error taxonomies.

python -m potato.export --config config.yaml --format agent_eval --output ./export/

Coding Agent Evaluation (coding_eval)

Export coding agent evaluation results including process reward model (PRM) labels, code review annotations, DPO pairs, and SWE-bench compatibility scores.

python -m potato.export --config config.yaml --format coding_eval --output ./export/

HuggingFace Datasets (huggingface)

Export annotations directly as a HuggingFace Dataset. See HuggingFace Hub Export for detailed options.

python -m potato.export --config config.yaml --format huggingface --output ./export/

Programmatic Export

Use the export registry directly in Python:

from potato.export.registry import export_registry
from potato.export.cli import build_export_context

# Build context from config
context = build_export_context("path/to/config.yaml")

# Export to COCO
result = export_registry.export("coco", context, "./output/")

if result.success:
    print(f"Exported {len(result.files_written)} files")
    print(f"Stats: {result.stats}")
else:
    print(f"Errors: {result.errors}")

Custom Exporters

Create custom exporters by subclassing BaseExporter:

from potato.export.base import BaseExporter, ExportContext, ExportResult

class MyExporter(BaseExporter):
    format_name = "my_format"
    description = "My custom export format"
    file_extensions = [".myformat"]

    def can_export(self, context: ExportContext) -> tuple:
        # Check if this exporter can handle the context
        has_spans = any(ann.get("spans") for ann in context.annotations)
        if not has_spans:
            return False, "No span annotations found"
        return True, None

    def export(self, context: ExportContext, output_path: str,
               options: dict = None) -> ExportResult:
        # Perform the export
        # ...
        return ExportResult(
            success=True,
            format_name=self.format_name,
            files_written=["output.myformat"],
            stats={"annotations": len(context.annotations)}
        )

# Register the exporter
from potato.export.registry import export_registry
export_registry.register(MyExporter())

Format Compatibility Matrix

Annotation Type	COCO	YOLO	Pascal VOC	CoNLL-2003	CoNLL-U	Mask	Parquet	CSV/TSV	EAF/TextGrid	Agent Eval
Bounding boxes	Yes	Yes	Yes	-	-	-	Yes	Yes	-	-
Polygons	Yes	-	-	-	-	Yes	Yes	-	-	-
Keypoints	Yes	-	-	-	-	-	Yes	-	-	-
Text spans	-	-	-	Yes	Yes	-	Yes	Yes	-	-
Classifications	Partial	-	-	-	-	-	Yes	Yes	-	-
Tiered segments	-	-	-	-	-	-	Yes	-	Yes	-
Agent traces	-	-	-	-	-	-	Yes	-	-	Yes

Best Practices

Choose the right format for your task:
Object detection → COCO, YOLO, or Pascal VOC
NER/Sequence labeling → CoNLL-2003
Linguistic analysis → CoNLL-U
Segmentation → Mask or COCO with segmentation
Validate exports before training:
Use format-specific validation tools
Check that all images/items are exported
Verify label distributions
Handle missing data:
Use --option include_unlabeled=false to skip unannotated items
Check export warnings for skipped items
Use consistent splits:
Set split_ratio for reproducible train/val splits
Or manage splits externally and export separately

Troubleshooting

No Annotations Exported

Check that annotation output directory exists
Verify users have completed annotations
Check that the annotation type is supported by the export format

Image Paths Not Found

Ensure image paths in data are accessible
Use absolute paths or paths relative to config file
Check for URL vs local file path issues

Label Mismatch

Verify label names match between schema and export
Check for case sensitivity issues
Ensure category IDs are consistent

Exporting via Admin API

All export formats are available through the admin API, allowing exports without CLI access. This is useful for remote deployments, HuggingFace Spaces, or integrating exports into automated workflows.

List Available Formats

curl http://localhost:8000/admin/api/export/formats \
  -H "X-API-Key: YOUR_ADMIN_KEY"

Run an Export

curl -X POST http://localhost:8000/admin/api/export \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_ADMIN_KEY" \
  -d '{
    "format": "coco",
    "output": "/path/to/output",
    "options": {}
  }'

The endpoint accepts any format returned by the formats listing endpoint. Format-specific options are passed in the options field.

See HuggingFace Hub Export for HuggingFace-specific options and HuggingFace Spaces for remote deployment guidance.

Data Format - Input data format
Configuration - Output configuration options
Image Annotation - Bounding box and polygon annotation
Schemas and Templates - All annotation types

Export Formats

Overview

Live Annotation Storage

Configuration

Auto-Export

Export CLI

Basic Usage

Command Options

Supported Export Formats

COCO (coco)

YOLO (yolo)

Pascal VOC (pascal_voc)

CoNLL-2003 (conll_2003)

CoNLL-U (conll_u)

Segmentation Masks (mask)

Parquet (parquet)

CSV (csv)

TSV (tsv)

JSONL (jsonl)

EAF - ELAN Annotation Format (eaf)

TextGrid - Praat (textgrid)

Agent Evaluation (agent_eval)

Coding Agent Evaluation (coding_eval)

HuggingFace Datasets (huggingface)

Programmatic Export

Custom Exporters

Format Compatibility Matrix

Best Practices

Troubleshooting

No Annotations Exported

Image Paths Not Found

Label Mismatch

Exporting via Admin API

List Available Formats

Run an Export

Related Documentation