Crowd-Kit

Computational quality control for crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Installing

To install Crowd-Kit, run the following command:

pip install crowd-kit

If you also want to use the learning subpackage, type

pip instal crowd-kit[learning]

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies:

pipenv install --dev

We use pytest for testing.

Getting started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented aggregation methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical responses

Multi-label responses

MethodStatus
Binary Relevance

Textual responses

MethodStatus
RASA
HRRASA
ROVER

Image segmentation

Pairwise comparisons

Learning from crowds

Questions and bug reports

Source code

Follow us in social media

Last updated: February 6, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing