GLAD

crowdkit.aggregation.classification.glad.GLAD | Source code

GLAD(
self,
n_iter: int = 100,
tol: float = 1e-05,
silent: bool = True,
labels_priors: Optional[Series] = None,
alphas_priors_mean: Optional[Series] = None,
betas_priors_mean: Optional[Series] = None,
m_step_max_iter: int = 25,
m_step_tol: float = 0.01
)

The GLAD (Generative model of Labels, Abilities, and Difficulties) model is a probabilistic model that parametrizes the abilities of workers and the difficulty of tasks.

Let's consider a case of KK class classification. Let pp be a vector of prior class probabilities, αi(,+)\alpha_i \in (-\infty, +\infty) be a worker ability parameter, βj(0,+)\beta_j \in (0, +\infty) be an inverse task difficulty, zjz_j be a latent variable representing the true task label, and yjiy^i_j be a worker response that we observe. The relationships between these variables and parameters according to GLAD are represented by the following latent label model:

GLAD latent label model

The prior probability of zjz_j being equal to cc is

Pr(zj=c)=p[c]\operatorname{Pr}(z_j = c) = p[c],

and the probability distribution of the worker responses with the true label cc follows the single coin Dawid-Skene model where the true label probability is a sigmoid function of the product of the worker ability and the inverse task difficulty:

Pr(yji=kzj=c)={a(i,j),k=c1a(i,j)K1,kc\operatorname{Pr}(y^i_j = k | z_j = c) = \begin{cases}a(i, j), & k = c \\ \frac{1 - a(i,j)}{K-1}, & k \neq c\end{cases},

where

a(i,j)=11+exp(αiβj)a(i,j) = \frac{1}{1 + \exp(-\alpha_i\beta_j)}.

Parameters pp, α\alpha, β\beta, and latent variables zz are optimized with the Expectation-Minimization algorithm:

  1. E-step. Estimates the true task label probabilities using the alpha parameters of workers' abilities, the prior label probabilities, and the beta parameters of task difficulty.
  2. M-step. Optimizes the alpha and beta parameters using the conjugate gradient method.

J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise.

Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009

https://proceedings.neurips.cc/paper/2009/file/f899139df5e1059396431415e770c6dd-Paper.pdf

Parameters description

ParametersTypeDescription
n_iterint

The maximum number of EM iterations.

tolfloat

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

silentbool

Specifies if the progress bar will be shown (false) or not (true).

labels_priorsOptional[Series]

The prior label probabilities.

alphas_priors_meanOptional[Series]

The prior mean value of the alpha parameters.

betas_priors_meanOptional[Series]

The prior mean value of the beta parameters.

m_step_max_iterint

The maximum number of iterations of the conjugate gradient method in the M-step.

m_step_tolfloat

The tolerance stopping criterion of the conjugate gradient method in the M-step.

labels_Optional[Series]

The task labels. The pandas.Series data is indexed by task so that labels.loc[task] is the most likely true label of tasks.

probas_Optional[DataFrame]

The probability distributions of task labels. The pandas.DataFrame data is indexed by task so that result.loc[task, label] is the probability that the task true label is equal to label. Each probability is in the range from 0 to 1, all task probabilities must sum up to 1.

alphas_Series

The alpha parameters of workers' abilities. The pandas.Series data is indexed by worker that contains the estimated alpha parameters.

betas_Series

The beta parameters of task difficulty. The pandas.Series data is indexed by task that contains the estimated beta parameters.

loss_history_List[float]

A list of loss values during training.

Examples:

from crowdkit.aggregation import GLAD
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
glad = GLAD()
result = glad.fit_predict(df)

Methods summary

MethodDescription
fitFits the model to the training data with the EM algorithm.
fit_predictFits the model to the training data and returns the aggregated results.
fit_predict_probaFits the model to the training data and returns probability distributions of labels for each task.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing