DawidSkene

crowdkit.aggregation.classification.dawid_skene.DawidSkene

DawidSkene(self, n_iter: int)

Dawid-Skene aggregation model

A. Philip Dawid and Allan M. Skene. 1979. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, 1 (1979), 20–28.

https://doi.org/10.2307/2346806

Parameters Description

ParametersTypeDescription
labels_Optional[Series]

Tasks labels A pandas.Series indexed by task such that labels.loc[task] is the taskss most likely true label.

probas_Optional[DataFrame]

Tasks label probability distributions A pandas.DataFrame indexed by task such that result.loc[task, label] is the probability of tasks true label to be equal to label. Each probability is between 0 and 1, all tasks probabilities should sum up to 1

priors_Optional[Series]

A prior label distribution A pandas.Series indexed by labels and holding corresponding labels probability of occurrence. Each probability is between 0 and 1, all probabilities should sum up to 1

errors_Optional[DataFrame]

Performers error matrices A pandas.DataFrame indexed by performer and label with a column for every label_id found in data such that result.loc[performer, observed_label, true_label] is the probability of performer producing an observed_label given that a tasks true label is true_label

ParametersTypeDescription
labels_Optional[Series]

Tasks labels A pandas.Series indexed by task such that labels.loc[task] is the taskss most likely true label.

probas_Optional[DataFrame]

Tasks label probability distributions A pandas.DataFrame indexed by task such that result.loc[task, label] is the probability of tasks true label to be equal to label. Each probability is between 0 and 1, all tasks probabilities should sum up to 1

priors_Optional[Series]

A prior label distribution A pandas.Series indexed by labels and holding corresponding labels probability of occurrence. Each probability is between 0 and 1, all probabilities should sum up to 1

errors_Optional[DataFrame]

Performers error matrices A pandas.DataFrame indexed by performer and label with a column for every label_id found in data such that result.loc[performer, observed_label, true_label] is the probability of performer producing an observed_label given that a tasks true label is true_label