MMSR

crowdkit.aggregation.classification.m_msr.MMSR | Source code

MMSR(
self,
n_iter: int = 10000,
tol: float = 1e-10,
random_state: Optional[int] = 0,
observation_matrix: ... = ...,
covariation_matrix: ... = ...,
n_common_tasks: ... = ...,
n_workers: int = 0,
n_tasks: int = 0,
n_labels: int = 0,
labels_mapping: Dict[Any, int] = ...,
workers_mapping: Dict[Any, int] = ...,
tasks_mapping: Dict[Any, int] = ...
)

Matrix Mean-Subsequence-Reduced Algorithm.

The M-MSR assumes that workers have different level of expertise and associated with a vector of "skills" s\boldsymbol{s} which entries sis_i show the probability of the worker ii to answer correctly to the given task. Having that, we can show that

E[MM1Cundefined1M111T]=ssT\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T,

where MM is the total number of classes, Cundefined\widetilde{C} is a covariation matrix between workers, and 11T\boldsymbol{1}\boldsymbol{1}^T is the all-ones matrix which has the same size as Cundefined\widetilde{C}.

So, the problem of recovering the skills vector s\boldsymbol{s} becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for rubust rank-one matrix completion, so its result is an estimator of the vector s\boldsymbol{s}.

Then, the aggregation is the weighted majority vote with weights equal to log(M1)si1si\log \frac{(M-1)s_i}{1-s_i}.

Matrix Mean-Subsequence-Reduced Algorithm. Qianqian Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion. 34th Conference on Neural Information Processing Systems (NeurIPS 2020)

https://arxiv.org/abs/2010.12181

Parameters Description

ParametersTypeDescription
n_iterint

The maximum number of iterations of the M-MSR algorithm.

eps-

Convergence threshold.

random_stateOptional[int]

Seed number for the random initialization.

labels_Optional[Series]

Tasks' labels. A pandas.Series indexed by task such that labels.loc[task] is the tasks's most likely true label.

skills_Optional[Series]

workers' skills. A pandas.Series index by workers and holding corresponding worker's skill

scores_Optional[DataFrame]

Tasks' label scores. A pandas.DataFrame indexed by task such that result.loc[task, label] is the score of label for task.

Examples:

from crowdkit.aggregation import MMSR
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
mmsr = MMSR()
result = mmsr.fit_predict(df)

Methods Summary

MethodDescription
fitEstimate the workers' skills.
fit_predictFit the model and return aggregated results.
fit_predict_scoreFit the model and return the total sum of weights for each label.
predictInfer the true labels when the model is fitted.
predict_scoreReturn total sum of weights for each label when the model is fitted.
Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing