MMSR

crowdkit.aggregation.classification.m_msr.MMSR | Source code

MMSR(
    self,
    n_iter: int = 10000,
    tol: float = 1e-10,
    random_state: Optional[int] = 0,
    observation_matrix: ... = _Nothing.NOTHING,
    covariation_matrix: ... = _Nothing.NOTHING,
    n_common_tasks: ... = _Nothing.NOTHING,
    n_workers: int = 0,
    n_tasks: int = 0,
    n_labels: int = 0,
    labels_mapping: Dict[Any, int] = _Nothing.NOTHING,
    workers_mapping: Dict[Any, int] = _Nothing.NOTHING,
    tasks_mapping: Dict[Any, int] = _Nothing.NOTHING
)

The Matrix Mean-Subsequence-Reduced Algorithm (M-MSR) model assumes that workers have different expertise levels and are represented as a vector of "skills" $s$ which entries $s_i$ show the probability that the worker $i$ will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:

$\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T$ ,

where $M$ is the total number of classes, $\widetilde{C}$ is a covariance matrix between workers, and $\boldsymbol{1}\boldsymbol{1}^T$ is the all-ones matrix which has the same size as $\widetilde{C}$ .

Thus, the problem of estimating the skill level vector $s$ becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the robust rank-one matrix completion, so its result is an estimator of the vector $s$ .

And the aggregation is weighted majority voting with weights equal to $\log \frac{(M-1)s_i}{1-s_i}$ .

Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.

34th Conference on Neural Information Processing Systems (NeurIPS 2020)

https://arxiv.org/abs/2010.12181

Parameters description

Parameters	Type	Description
`n_iter`	int	The maximum number of iterations.
`tol`	float	The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the `tol` parameter.
`random_state`	Optional[int]	The seed number for the random initialization.
`_observation_matrix`	-	The matrix representing which workers give responses to which tasks.
`_covariation_matrix`	-	The matrix representing the covariance between workers.
`_n_common_tasks`	-	The matrix representing workers with tasks in common.
`_n_workers`	-	The number of workers.
`_n_tasks`	-	The number of tasks that are assigned to workers.
`_n_labels`	-	The number of possible labels for a series of classification tasks.
`_labels_mapping`	-	The mapping of labels and integer values.
`_workers_mapping`	-	The mapping of workers and integer values.
`_tasks_mapping`	-	The mapping of tasks and integer values.
`labels_`	Optional[Series]	The task labels. The `pandas.Series` data is indexed by `task` so that `labels.loc[task]` is the most likely true label of tasks.
`skills_`	Optional[Series]	The workers' skills. The `pandas.Series` data is indexed by `worker` and has the corresponding worker skill.
`scores_`	Optional[DataFrame]	The task label scores. The `pandas.DataFrame` data is indexed by `task` so that `result.loc[task, label]` is a score of `label` for `task`.
`loss_history_`	List[float]	A list of loss values during training.

Examples:

from crowdkit.aggregation import MMSR
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
mmsr = MMSR()
result = mmsr.fit_predict(df)

Methods summary

Method	Description
fit	Fits the model to the training data.
fit_predict	Fits the model to the training data and returns the aggregated results.
fit_predict_score	Fits the model to the training data and returns the total sum of weights for each label.
predict	Predicts the true labels of tasks when the model is fitted.
predict_score	Returns the total sum of weights for each label when the model is fitted.

Last updated: March 31, 2023

Crowd-Kit

Overview

Reference

Aggregation

Datasets

Learning

Metrics

Postprocessing