# MMSR

crowdkit.aggregation.classification.m_msr.MMSR | Source code

MMSR(    self,    n_iter: int = 10000,    tol: float = 1e-10,    random_state: Optional[int] = 0,    observation_matrix: ... = _Nothing.NOTHING,    covariation_matrix: ... = _Nothing.NOTHING,    n_common_tasks: ... = _Nothing.NOTHING,    n_workers: int = 0,    n_tasks: int = 0,    n_labels: int = 0,    labels_mapping: Dict[Any, int] = _Nothing.NOTHING,    workers_mapping: Dict[Any, int] = _Nothing.NOTHING,    tasks_mapping: Dict[Any, int] = _Nothing.NOTHING)

The Matrix Mean-Subsequence-Reduced Algorithm (M-MSR) model assumes that workers have different expertise levels and are represented as a vector of "skills" $s$ which entries $s_i$ show the probability that the worker $i$ will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:

$\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T$,

where $M$ is the total number of classes, $\widetilde{C}$ is a covariance matrix between workers, and $\boldsymbol{1}\boldsymbol{1}^T$ is the all-ones matrix which has the same size as $\widetilde{C}$.

Thus, the problem of estimating the skill level vector $s$ becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the robust rank-one matrix completion, so its result is an estimator of the vector $s$.

And the aggregation is weighted majority voting with weights equal to $\log \frac{(M-1)s_i}{1-s_i}$.

Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.

34th Conference on Neural Information Processing Systems (NeurIPS 2020)

https://arxiv.org/abs/2010.12181

## Parameters description

ParametersTypeDescription
n_iterint

The maximum number of iterations.

tolfloat

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

random_stateOptional[int]

The seed number for the random initialization.

_observation_matrix-

The matrix representing which workers give responses to which tasks.

_covariation_matrix-

The matrix representing the covariance between workers.

_n_common_tasks-

The matrix representing workers with tasks in common.

_n_workers-

The number of workers.

_n_tasks-

The number of tasks that are assigned to workers.

_n_labels-

The number of possible labels for a series of classification tasks.

_labels_mapping-

The mapping of labels and integer values.

_workers_mapping-

The mapping of workers and integer values.

_tasks_mapping-

The mapping of tasks and integer values.

labels_Optional[Series]

The task labels. The pandas.Series data is indexed by task so that labels.loc[task] is the most likely true label of tasks.

skills_Optional[Series]

The workers' skills. The pandas.Series data is indexed by worker and has the corresponding worker skill.

scores_Optional[DataFrame]

The task label scores. The pandas.DataFrame data is indexed by task so that result.loc[task, label] is a score of label for task.

loss_history_List[float]

A list of loss values during training.

Examples:

from crowdkit.aggregation import MMSRfrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')mmsr = MMSR()result = mmsr.fit_predict(df)

## Methods summary

MethodDescription
fitFits the model to the training data.
fit_predictFits the model to the training data and returns the aggregated results.
fit_predict_scoreFits the model to the training data and returns the total sum of weights for each label.
predictPredicts the true labels of tasks when the model is fitted.
predict_scoreReturns the total sum of weights for each label when the model is fitted.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing