crowdkit.aggregation.classification.glad.GLAD | Source code

GLAD(    self,    n_iter: int = 100,    tol: float = 1e-05,    silent: bool = True,    labels_priors: Optional[Series] = None,    alphas_priors_mean: Optional[Series] = None,    betas_priors_mean: Optional[Series] = None,    m_step_max_iter: int = 25,    m_step_tol: float = 0.01)

Generative model of Labels, Abilities, and Difficulties.

A probabilistic model that parametrizes workers' abilities and tasks' dificulties.

Let's consider a case of $K$ class classification. Let $p$ be a vector of prior class probabilities,

$\alpha_i \in (-\infty, +\infty)$ be a worker's ability parameter, $\beta_j \in (0, +\infty)$ be an inverse task's difficulty, $z_j$ be a latent variable representing the true task's label, and $y^i_j$ be a worker's response that we observe. The relationships between this variables and parameters according to GLAD are represented by the following latent label model:

The prior probability of $z_j$ being equal to $c$ is

$\operatorname{Pr}(z_j = c) = p[c]$,

the probability distribution of the worker's responses conditioned by the true label value $c$ follows the single coin Dawid-Skene model where the true label probability is a sigmoid function of the product of worker's ability and inverse task's difficulty:

$\operatorname{Pr}(y^i_j = k | z_j = c) = \begin{cases}a(i, j), & k = c \\ \frac{1 - a(i,j)}{K-1}, & k \neq c\end{cases}$,

where

$a(i,j) = \frac{1}{1 + \exp(-\alpha_i\beta_j)}$.

Parameters $p$, $\alpha$, $\beta$ and latent variables $z$ are optimized through the Expectation-Minimization algorithm.

J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009

https://proceedings.neurips.cc/paper/2009/file/f899139df5e1059396431415e770c6dd-Paper.pdf

Parameters Description

ParametersTypeDescription
max_iter-

Maximum number of EM iterations.

eps-

Threshold for convergence criterion.

silentbool

If false, show progress bar.

labels_priorsOptional[Series]

Prior label probabilities.

alphas_priors_meanOptional[Series]

Prior mean value of alpha parameters.

betas_priors_meanOptional[Series]

Prior mean value of beta parameters.

m_step_max_iterint

Maximum number of iterations of conjugate gradient method in M-step.

m_step_tolfloat

Tol parameter of conjugate gradient method in M-step.

labels_Optional[Series]

Tasks' labels. A pandas.Series indexed by task such that labels.loc[task] is the tasks's most likely true label.

probas_Optional[DataFrame]

Tasks' label probability distributions. A pandas.DataFrame indexed by task such that result.loc[task, label] is the probability of task's true label to be equal to label. Each probability is between 0 and 1, all task's probabilities should sum up to 1

alphas_Series

workers' alpha parameters. A pandas.Series indexed by worker that contains estimated alpha parameters.

betas_Series

Tasks' beta parameters. A pandas.Series indexed by task that contains estimated beta parameters.

Examples:

from crowdkit.aggregation import GLADfrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')glad = GLAD()result = glad.fit_predict(df)

Methods Summary

MethodDescription
fitFit the model through the EM-algorithm.
fit_predictFit the model and return aggregated results.
fit_predict_probaFit the model and return probability distributions on labels for each task.
Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing