KOS

crowdkit.aggregation.classification.kos.KOS | Source code

KOS(
self,
n_iter: int = 100,
random_state: int = 0
)

The KOS (Karger, Oh, and Shah 2011) aggregation model is an iterative algorithm that calculates the log-likelihood of the task being positive while modeling the worker reliability.

Let AijA_{ij} be a matrix of the responses of a worker jj on a task ii.

If the worker jj does not respond to the task ii, then Aij=0A_{ij} = 0. Otherwise, Aij=1|A_{ij}| = 1.

The algorithm operates on real-valued task messages xijx_{i \rightarrow j} and worker messages yjiy_{j \rightarrow i}. A task message xijx_{i \rightarrow j} represents the log-likelihood of task ii being a positive task, and a worker message yjiy_{j \rightarrow i} represents how reliable worker jj is.

At kk-th iteration, the values are updated as follows:

xij(k)=ji\jAijyji(k1)yji(k)=ij\iAijxij(k1)x_{i \rightarrow j}^{(k)} = \sum_{j^{'} \in \partial i \backslash j} A_{ij^{'}} y_{j^{'} \rightarrow i}^{(k-1)} \\ y_{j \rightarrow i}^{(k)} = \sum_{i^{'} \in \partial j \backslash i} A_{i^{'}j} x_{i^{'} \rightarrow j}^{(k-1)}

David R. Karger, Sewoong Oh, and Devavrat Shah. Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems.

Operations Research 62.1 (2014), 1-38.

https://arxiv.org/abs/1110.3564

Parameters description

ParametersTypeDescription
n_iterint

The maximum number of iterations.

random_stateint

The state of the random number generator.

labels_Optional[Series]

The task labels. The pandas.Series data is indexed by task so that labels.loc[task] is the most likely true label of tasks.

Examples:

from crowdkit.aggregation import KOS
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
ds = KOS(10)
result = ds.fit_predict(df)

Methods summary

MethodDescription
fitFits the model to the training data.
fit_predictFits the model to the training data and returns the aggregated results.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing