Toloka documentation


crowdkit.aggregation.classification.dawid_skene.OneCoinDawidSkene | Source code

    n_iter: int = 100,
    tol: float = 1e-05

One-coin Dawid-Skene aggregation model.

This model works exactly like original Dawid-Skene model based on EM Algorithm except for workers' error calculation on M-step of the algorithm.

First the workers' skills are calculated as their accuracy in accordance with labels probability. Let ewe^w be a worker's confusion (error) matrix of size K×KK \times K in case of KK class classification, pp be a vector of prior classes probabilities, zjz_j be a true task's label, and yjwy^w_j be a worker's answer for the task jj. Let s_{w} be a worker's skill (accuracy). Then the error

ej,zjw={swyjw=zj1swK1yjwzje^w_{j,z_j} = \begin{cases} s_{w} & y^w_j = z_j \\ \frac{1 - s_{w}}{K - 1} & y^w_j \neq z_j \end{cases}

Parameters Description

Parameters Type Description
n_iter int

The number of EM iterations.

labels_ Optional[Series]

Tasks' labels. A pandas.Series indexed by task such that labels.loc[task] is the tasks's most likely true label.

probas_ DataFrame

Tasks' label probability distributions. A pandas.DataFrame indexed by task such that result.loc[task, label] is the probability of task's true label to be equal to label. Each probability is between 0 and 1, all task's probabilities should sum up to 1

priors_ Series

A prior label distribution. A pandas.Series indexed by labels and holding corresponding label's probability of occurrence. Each probability is between 0 and 1, all probabilities should sum up to 1

errors_ DataFrame

Workers' error matrices. A pandas.DataFrame indexed by worker and label with a column for every label_id found in data such that result.loc[worker, observed_label, true_label] is the probability of worker producing an observed_label given that a task's true label is true_label

skills_ Series

workers' skills. A pandas.Series index by workers and holding corresponding worker's skill


from crowdkit.aggregation import OneCoinDawidSkene
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
hds = OneCoinDawidSkene(100)
result = hds.fit_predict(df)

Methods Summary

Method Description
fit Fit the model through the EM-algorithm.