Semantic Curation (Catalog)¶
Find what to review by similarity, not just rules or uncertainty. An embedding index over your items powers similarity search ("find traces like this failure") and dynamic slices — saved semantic + metadata filters that auto-include new matching traces and curate into datasets. This is the LabelBox-Catalog-style discovery layer; it complements triage (signal rules) and active learning (model uncertainty).
Enabling¶
curation:
enabled: true
model_name: all-MiniLM-L6-v2 # any sentence-transformers model
embed_on_ingest: false # set true to index runtime-ingested traces on arrival
text_key: task_description # which field to embed (defaults to the item text)
Embeddings are lazy — sentence-transformers is imported only when you build
the index, never at startup (so boot stays fast). Install it with
pip install sentence-transformers, or wire a custom embedder. When enabled, the
admin dashboard shows a Catalog link.
Build the index¶
The index is built on demand (or incrementally on ingest with embed_on_ingest):
curl -X POST localhost:8000/admin/catalog/api/build -H "X-API-Key: <key>"
# {"indexed": 1234}
Similarity search¶
Search by free-text query or by an anchor instance (find neighbours of a known example). Results are ranked by cosine similarity, with an adjustable threshold.
curl -X POST localhost:8000/admin/catalog/api/search -H "X-API-Key: <key>" \
-H "Content-Type: application/json" \
-d '{"query": "tool call failed", "top_k": 10, "threshold": 0.3}'
# or: {"anchor_id": "trace-42", ...} (excludes the anchor itself)
Dynamic slices¶
A slice is a saved filter that resolves on demand against the current index — so traces ingested after you saved it are automatically included if they match. A slice combines (optional) semantic neighborhood with a metadata filter (the shared condition grammar):
curl -X POST localhost:8000/admin/catalog/api/slices -H "X-API-Key: <key>" \
-H "Content-Type: application/json" \
-d '{"name": "tool-errors", "query": "tool call failed", "threshold": 0.3,
"metadata_filter": [{"field": "metadata.outcome", "equals": "error"}]}'
curl localhost:8000/admin/catalog/api/slices/tool-errors/resolve -H "X-API-Key: <key>"
# {"count": 17, "instance_ids": [...]}
Curate a slice into a dataset¶
curl -X POST localhost:8000/admin/catalog/api/slices/tool-errors/to_dataset \
-H "X-API-Key: <key>" -H "Content-Type: application/json" \
-d '{"dataset": "tool-errors-to-fix", "include_annotations": false}'
The resolved instances become examples in the named dataset, ready for annotation, experiments, or SFT/DPO export.
Discover failure modes (bottom-up taxonomy)¶
Where the MAST taxonomy tags traces against a fixed known set, discovery builds a project-specific taxonomy bottom-up — the qualitative open/axial-coding workflow over agent traces. On the Catalog page, Discover failure modes clusters the indexed traces and asks the judge to propose a candidate label + description per cluster from representative examples; you then confirm or edit each code (a cluster the judge can't name shows as "unlabeled — add a code").
curl -X POST localhost:8000/admin/catalog/api/discover -H "X-API-Key: <key>" \
-H "Content-Type: application/json" -d '{"k": 6}'
# -> {"clusters": [{size, suggested_label, suggested_description, examples, member_ids}, ...]}
Clustering is deterministic spherical k-means over the embedding index (pure Python);
LLM labeling is optional (use_llm: false returns clusters + examples for fully
manual coding). Restrict to a subset (e.g. only failed traces) with instance_ids.
This complements the MAST tagging schema: discover the modes, then tag at scale.
API summary¶
| Method | Path | Purpose |
|---|---|---|
| POST | /admin/catalog/api/build |
Build the embedding index over current items |
| POST | /admin/catalog/api/search |
{query\|anchor_id, top_k, threshold} |
| POST | /admin/catalog/api/discover |
{k, instance_ids?, use_llm?} → candidate failure-mode clusters |
| GET/POST | /admin/catalog/api/slices |
List / create slices |
| GET | /admin/catalog/api/slices/<n>/resolve |
Resolve a slice → instance ids |
| DELETE | /admin/catalog/api/slices/<n> |
Delete a slice |
| POST | /admin/catalog/api/slices/<n>/to_dataset |
Curate a slice into a dataset |
Example¶
examples/agent-traces/semantic-curation/ is a runnable demo.
Related¶
- Datasets & Experiments — slice curation target
- Automation Rules — rule-based routing (shares the condition grammar)
- Triage Queue — signal-based prioritization