Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
  • metrics of uncertainty, consistency, and agreement with aggregate
  • loaders for popular crowdsourced datasets

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the worker responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method Status
Majority Vote ✅
Dawid-Skene ✅
Gold Majority Vote ✅
M-MSR ✅
Wawa ✅
Zero-Based Skill ✅
GLAD ✅
BCC 🟡

Textual Responses

Method Status
RASA ✅
HRRASA ✅
ROVER ✅

Image Segmentation

Method Status
Segmentation MV ✅
Segmentation RASA ✅
Segmentation EM ✅

Pairwise Comparisons

Method Status
Bradley-Terry ✅
Noisy Bradley-Terry ✅

Questions and Bug Reports

Source Code

Toloka Global Community

Stay informed about updates to the platform and open-source libraries — connect with the Toloka team in our Global Community for announcements, discussions, and invites to events.

License

© YANDEX LLC, 2020-2022. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.