GLAD
crowdkit.aggregation.classification.glad.GLAD
| Source code
GLAD(
self,
n_iter: int = 100,
tol: float = 1e-05,
silent: bool = True,
labels_priors: Optional[Series] = None,
alphas_priors_mean: Optional[Series] = None,
betas_priors_mean: Optional[Series] = None,
m_step_max_iter: int = 25,
m_step_tol: float = 0.01
)
Generative model of Labels, Abilities, and Difficulties.
A probabilistic model that parametrizes workers' abilities and tasks' dificulties. Let's consider a case of class classification. Let be a vector of prior class probabilities, be a worker's ability parameter, be an inverse task's difficulty, be a latent variable representing the true task's label, and be a worker's response that we observe. The relationships between this variables and parameters according to GLAD are represented by the following latent label model:
The prior probability of being equal to is
the probability distribution of the worker's responses conditioned by the true label value follows the single coin Dawid-Skene model where the true label probability is a sigmoid function of the product of worker's ability and inverse task's difficulty:
where
Parameters , , and latent variables are optimized through the Expectation-Minimization algorithm.
J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. Proceedings of the 22nd International Conference on Neural Information Processing Systems, 2009
https://proceedings.neurips.cc/paper/2009/file/f899139df5e1059396431415e770c6dd-Paper.pdf
Parameters Description
Parameters | Type | Description |
---|---|---|
max_iter |
- | Maximum number of EM iterations. |
eps |
- | Threshold for convergence criterion. |
silent |
bool | If false, show progress bar. |
labels_priors |
Optional[Series] | Prior label probabilities. |
alphas_priors_mean |
Optional[Series] | Prior mean value of alpha parameters. |
betas_priors_mean |
Optional[Series] | Prior mean value of beta parameters. |
m_step_max_iter |
int | Maximum number of iterations of conjugate gradient method in M-step. |
m_step_tol |
float | Tol parameter of conjugate gradient method in M-step. |
labels_ |
Optional[Series] | Tasks' labels. A pandas.Series indexed by |
probas_ |
Optional[DataFrame] | Tasks' label probability distributions. A pandas.DataFrame indexed by |
alphas_ |
Series | workers' alpha parameters. A pandas.Series indexed by |
betas_ |
Series | Tasks' beta parameters. A pandas.Series indexed by |
Examples:
from crowdkit.aggregation import GLAD
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
glad = GLAD()
result = glad.fit_predict(df)
Methods Summary
Method | Description |
---|---|
fit | Fit the model through the EM-algorithm. |
fit_predict | Fit the model and return aggregated results. |
fit_predict_proba | Fit the model and return probability distributions on labels for each task. |