Constant Sum / Points Allocation
The constant sum schema presents annotators with a fixed budget of points to allocate across a set of categories. Unlike soft labels which are percentage-based, constant sum tasks can use any total and are framed explicitly as an allocation exercise: "distribute 100 points across these topics according to how relevant each is." This framing naturally encourages comparative judgments rather than independent ratings of each category.
Overview
Constant sum (also called budget allocation) is a measurement technique from marketing research adapted for NLP annotation:
- Annotators receive a fixed budget (e.g., 100 points)
- They enter values for each category using number inputs or sliders
- The UI enforces the constraint that all values sum exactly to the total
- Remaining budget is displayed as a running counter
This approach elicits relative importance judgments and prevents scale-usage bias (the tendency to rate everything high or low when using independent scales).
Research Basis
- Louviere et al. (2015). Best-Worst Scaling: Theory, Methods and Applications. Cambridge University Press. Discusses constant sum as a foundational comparative measurement technique alongside BWS.
- Thurstone, L. L. (1927). "A Law of Comparative Judgment." Psychological Review 34(4). Foundational work establishing that comparative judgments yield more reliable interval-scale measurements than absolute ratings.
Configuration
Options
| Option | Default | Description |
|---|---|---|
annotation_type |
— | Must be constant_sum |
name |
— | Schema identifier (required) |
description |
— | Task instruction shown to annotators |
labels |
— | List of category names (required, minimum 2) |
total_points |
100 |
Total budget to distribute |
min_per_item |
0 |
Minimum points any single category can receive |
input_type |
"number" |
Input widget: number (text box) or slider |
label_requirement.required |
false |
Require exact allocation before proceeding |
YAML Example — Number Inputs
annotation_schemes:
- annotation_type: constant_sum
name: topic_allocation
description: "Distribute 100 points across the topics present in this article. Give more points to topics that are more prominent."
total_points: 100
min_per_item: 0
input_type: number
labels:
- Politics
- Sports
- Technology
- Entertainment
- Science
label_requirement:
required: true
YAML Example — Slider Inputs
annotation_schemes:
- annotation_type: constant_sum
name: emotion_mix
description: "Distribute 10 points across the emotions expressed in this post."
total_points: 10
min_per_item: 0
input_type: slider
labels:
- Joy
- Anger
- Sadness
- Fear
- Surprise
Requiring Minimum Allocations
Use min_per_item to force annotators to consider every category:
annotation_schemes:
- annotation_type: constant_sum
name: argument_components
description: "Distribute 100 points across argument components. Every component must receive at least 5 points."
total_points: 100
min_per_item: 5
labels:
- Claim
- Evidence
- Warrant
- Rebuttal
Output Format
Annotations are stored as a dictionary mapping each label to its allocated value string:
{
"topic_allocation": {
"Politics": "30",
"Sports": "20",
"Technology": "50",
"Entertainment": "0",
"Science": "0"
}
}
Values are string-encoded integers. The sum of all values equals total_points.
Use Cases
- Multi-topic document labeling — measure the proportion of content devoted to each topic
- Resource/effort allocation — estimate how much attention a document gives to each aspect
- Comparative aspect rating — evaluate which qualities (fluency, relevance, coherence) contribute most to overall quality
- Emotion composition — capture mixed emotional content where multiple emotions co-occur
- Argument mining — quantify the presence of different argument components
Troubleshooting
Annotators find allocation tedious for many categories: Limit to 5–7 categories maximum when using constant sum. For larger label sets consider hierarchical multiselect or a soft label schema.
Sum constraint causes frustration: Make the running balance counter prominent and use the number input type for large totals (easier to correct) and slider for small totals (e.g., 10 points).
Related Documentation
- Soft Label — percentage-based probability distribution
- Best-Worst Scaling — comparative ranking without explicit scores
- Slider — single continuous slider
- Schema Gallery — all annotation types with examples