crowdkit.aggregation.classification.dawid_skene.DawidSkene | Source code
DawidSkene( self, n_iter: int = 100, tol: float = 1e-05 )
Dawid-Skene aggregation model.
Probabilistic model that parametrizes workers' level of expertise through confusion matrices.
Let be a worker's confusion (error) matrix of size in case of class classification, be a vector of prior classes probabilities, be a true task's label, and be a worker's answer for the task . The relationships between these parameters are represented by the following latent label model.
Here the prior true label probability is
and the distribution on the worker's responses given the true label is represented by the corresponding column of the error matrix:
Parameters and and latent variables are optimized through the Expectation-Maximization algorithm.
A. Philip Dawid and Allan M. Skene. Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 28, 1 (1979), 20–28.
The number of EM iterations.
Tasks' labels. A pandas.Series indexed by
Tasks' label probability distributions. A pandas.DataFrame indexed by
A prior label distribution. A pandas.Series indexed by labels and holding corresponding label's probability of occurrence. Each probability is between 0 and 1, all probabilities should sum up to 1
Workers' error matrices. A pandas.DataFrame indexed by
from crowdkit.aggregation import DawidSkene from crowdkit.datasets import load_dataset df, gt = load_dataset('relevance-2') ds = DawidSkene(100) result = ds.fit_predict(df)
|fit||Fit the model through the EM-algorithm.|
|fit_predict||Fit the model and return aggregated results.|
|fit_predict_proba||Fit the model and return probability distributions on labels for each task.|