Computational quality control for crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
To install Crowd-Kit, run the following command:
pip install crowd-kit
If you also want to use the learning
subpackage, type
pip instal crowd-kit[learning]
If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies:
pipenv install --dev
We use pytest for testing.
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkenefrom crowdkit.datasets import load_datasetimport pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task
, worker
, label
. Alternatively, you can download an example dataset:
df = pd.read_csv('results.csv') # should contain columns: task, worker, label# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then, you can aggregate the workers' responses using the fit_predict
method from the scikit-learn library:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Below is the list of currently implemented methods, including the already available (β ) and in progress (π‘).
Method | Status |
---|---|
Majority Vote | β |
One-coin Dawid-Skene | β |
Dawid-Skene | β |
Gold Majority Vote | β |
M-MSR | β |
Wawa | β |
Zero-Based Skill | β |
GLAD | β |
KOS | β |
MACE | β |
BCC | π‘ |
Method | Status |
---|---|
Binary Relevance | β |
Method | Status |
---|---|
Segmentation MV | β |
Segmentation RASA | β |
Segmentation EM | β |
Method | Status |
---|---|
Bradley-Terry | β |
Noisy Bradley-Terry | β |
Method | Status |
---|---|
CrowdLayer | β |
CoNAL | β |
TextSummarization | β |
Stay informed about updates to the platform and open-source libraries β connect with the Toloka team in our Global Community for announcements, discussions, and invites to events.