Crowd-Kit

Computational quality control for crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Installing

To install Crowd-Kit, run the following command:

pip install crowd-kit

If you also want to use the learning subpackage, type

pip instal crowd-kit[learning]

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies:

pipenv install --dev

We use pytest for testing.

Getting started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented aggregation methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical responses

Multi-label responses

MethodStatus
Binary Relevance

Textual responses

MethodStatus
RASA
HRRASA
ROVER

Image segmentation

Pairwise comparisons

Learning from crowds

Questions and bug reports

  • To report a bug, post an issue on the Toloka/bugreport page.
  • To find answers to common questions or start a new discussion, join our English-speaking Slack community.

Source code

Toloka global community

Stay informed about updates to the platform and open-source libraries — connect with the Toloka team in our Global Community for announcements, discussions, and invites to events.

Follow us in social media

Last updated: February 6, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing