Toloka documentation


crowdkit.postprocessing.entropy_threshold.entropy_threshold | Source code

    answers: DataFrame,
    workers_skills: Optional[Series] = None,
    percentile: int = 10,
    min_answers: int = 2

Entropy thresholding postprocessing: filters out all answers by workers,

whos' entropy (uncertanity) of answers is below specified percentile.

This heuristic detects answers of workers that answer the same way too often, e.g. when "speed-running" by only clicking one button.

Parameters Description

Parameters Type Description
answers DataFrame

Workers' labeling results. A pandas.DataFrame containing task, worker and label columns.

workers_skills Optional[Series]

workers' skills. A pandas.Series index by workers and holding corresponding worker's skill

  • Returns:


  • Return type:



Fraudent worker always answers the same and gets filtered out.

answers = pd.DataFrame.from_records(
        {'task': '1', 'worker': 'A', 'label': frozenset(['dog'])},
        {'task': '1', 'worker': 'B', 'label': frozenset(['cat'])},
        {'task': '2', 'worker': 'A', 'label': frozenset(['cat'])},
        {'task': '2', 'worker': 'B', 'label': frozenset(['cat'])},
        {'task': '3', 'worker': 'A', 'label': frozenset(['dog'])},
        {'task': '3', 'worker': 'B', 'label': frozenset(['cat'])},

0 1 A (dog) 2 2 A (cat) 4 3 A (dog)