Search

Universal full-text search over instance text, backed by SQLite FTS5. Not gated to QDA Mode — admins/adjudicators can search any project to locate instances. An optional, guarded annotator search-and-claim lets annotators pull rare candidates into their own queue.

Overview

Lexical search via a SearchBackend abstraction. FTS5 ships now; a VectorBackend stub documents the contract for future semantic search.
The index is built from instance text on server start and lives in the universal <task_dir>/project.sqlite (instance_fts table).
If the SQLite build lacks FTS5, search is cleanly disabled (endpoints return 503); the rest of Potato is unaffected.

Configuration

search:
  enabled: true            # default true (universal)
  backend: fts5            # only fts5 in this release
  max_instances: 100000    # cap on indexed instances
  annotator_claim: false   # opt-in annotator search-and-claim (guarded)

Option	Default	Description
`search.enabled`	`true`	Build the index and enable endpoints.
`search.backend`	`fts5`	Search backend.
`search.max_instances`	`100000`	Maximum instances indexed.
`search.annotator_claim`	`false`	Enable annotator-facing search + claim (see guard below).

Endpoints

GET /admin/api/search?q=<query>&limit=<n> — admin/adjudicator, read-only. Always safe (no self-selection). Requires the admin API key (X-API-Key) or adjudicator status.
GET /api/search?q= — annotator search (only when annotator_claim: true).
POST /api/search/claim {instance_id} — pull a matching instance into the annotator's queue (only when annotator_claim: true).

User queries are tokenized and quoted before hitting FTS5, so arbitrary punctuation (including injection attempts) is safe and never interpreted as FTS5 syntax.

Annotator search-and-claim: compatibility guard

Letting annotators search and claim instances is self-selection, which corrupts designs where the platform — not the annotator — must choose the next item. When search.annotator_claim: true, Potato refuses to start (raises a configuration error naming the conflict) if any of these are also configured:

Conflicting feature	Why
`assignment_strategy`: random / diversity_clustering / max_diversity / active_learning / llm_confidence / least_annotated / category_based	Self-selection breaks sampling/ordering
`max_annotations_per_item` / `num_annotators_per_item` / `min_annotators_per_instance` > 1	IAA overlap can't be guaranteed
`attention_checks.enabled` / `gold_standards.enabled`	QC items could be located/avoided
`icl_labeling.enabled`	Blind LLM-verification tasks must not be findable
`adjudication.enabled`	The adjudication queue is curated
MTurk / Prolific backend	HIT = the assigned unit; breaks payment/coverage

Annotator claim is supported with solo_mode/qda_mode (single coder over the whole corpus), or fixed_order assignment without overlap, QC injection, ICL verification, adjudication, or a crowd backend. For every other design, use read-only admin search instead.

Note: under fixed_order the whole corpus is typically pre-assigned to a user, so claim is most useful when per-user assignment is capped (max_annotations_per_user) or instances are assigned incrementally.

Example

python potato/flask_server.py start examples/advanced/search-example/config.yaml -p 8000
# then, with the admin key from the config:
curl -H "X-API-Key: search-example-key" \
  "http://localhost:8000/admin/api/search?q=rare"