entropy_threshold

crowdkit.postprocessing.entropy_threshold.entropy_threshold | Source code

entropy_threshold(
    answers: DataFrame,
    workers_skills: Optional[Series] = None,
    percentile: int = 10,
    min_answers: int = 2
)

Entropy thresholding postprocessing: filters out all answers by workers, whose entropy (uncertanity) of answers is below specified percentile.

This heuristic detects answers of workers that answer the same way too often, e.g. when "speed-running" by only clicking one button.

Parameters description

Parameters	Type	Description
`answers`	DataFrame	Workers' labeling results. A pandas.DataFrame containing `task`, `worker` and `label` columns.
`workers_skills`	Optional[Series]	workers' skills. A pandas.Series index by workers and holding corresponding worker's skill

Returns:

pd.DataFrame
Return type:

DataFrame

Examples:

Fraudent worker always answers the same and gets filtered out.

answers = pd.DataFrame.from_records(
    [
        {'task': '1', 'worker': 'A', 'label': frozenset(['dog'])},
        {'task': '1', 'worker': 'B', 'label': frozenset(['cat'])},
        {'task': '2', 'worker': 'A', 'label': frozenset(['cat'])},
        {'task': '2', 'worker': 'B', 'label': frozenset(['cat'])},
        {'task': '3', 'worker': 'A', 'label': frozenset(['dog'])},
        {'task': '3', 'worker': 'B', 'label': frozenset(['cat'])},
    ]
)
entropy_threshold(answers)

0 1 A (dog) 2 2 A (cat) 4 3 A (dog)

Last updated: March 31, 2023

Crowd-Kit

Overview

Reference

Aggregation

Datasets

Learning

Metrics

Postprocessing