crowdkit.aggregation.classification.m_msr.MMSR | Source code

n_iter: int = 10000,
tol: float = 1e-10,
random_state: Optional[int] = 0,
observation_matrix: ... = _Nothing.NOTHING,
covariation_matrix: ... = _Nothing.NOTHING,
n_common_tasks: ... = _Nothing.NOTHING,
n_workers: int = 0,
n_tasks: int = 0,
n_labels: int = 0,
labels_mapping: Dict[Any, int] = _Nothing.NOTHING,
workers_mapping: Dict[Any, int] = _Nothing.NOTHING,
tasks_mapping: Dict[Any, int] = _Nothing.NOTHING

The Matrix Mean-Subsequence-Reduced Algorithm (M-MSR) model assumes that workers have different expertise levels and are represented as a vector of "skills" ss which entries sis_i show the probability that the worker ii will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:

E[MM1Cundefined1M111T]=ssT\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T,

where MM is the total number of classes, Cundefined\widetilde{C} is a covariance matrix between workers, and 11T\boldsymbol{1}\boldsymbol{1}^T is the all-ones matrix which has the same size as Cundefined\widetilde{C}.

Thus, the problem of estimating the skill level vector ss becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the robust rank-one matrix completion, so its result is an estimator of the vector ss.

And the aggregation is weighted majority voting with weights equal to log(M1)si1si\log \frac{(M-1)s_i}{1-s_i}.

Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.

34th Conference on Neural Information Processing Systems (NeurIPS 2020)

Parameters description


The maximum number of iterations.


The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.


The seed number for the random initialization.


The matrix representing which workers give responses to which tasks.


The matrix representing the covariance between workers.


The matrix representing workers with tasks in common.


The number of workers.


The number of tasks that are assigned to workers.


The number of possible labels for a series of classification tasks.


The mapping of labels and integer values.


The mapping of workers and integer values.


The mapping of tasks and integer values.


The task labels. The pandas.Series data is indexed by task so that labels.loc[task] is the most likely true label of tasks.


The workers' skills. The pandas.Series data is indexed by worker and has the corresponding worker skill.


The task label scores. The pandas.DataFrame data is indexed by task so that result.loc[task, label] is a score of label for task.


A list of loss values during training.


from crowdkit.aggregation import MMSR
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
mmsr = MMSR()
result = mmsr.fit_predict(df)

Methods summary

fitFits the model to the training data.
fit_predictFits the model to the training data and returns the aggregated results.
fit_predict_scoreFits the model to the training data and returns the total sum of weights for each label.
predictPredicts the true labels of tasks when the model is fitted.
predict_scoreReturns the total sum of weights for each label when the model is fitted.

Last updated: March 31, 2023