MMSR

crowdkit.aggregation.classification.m_msr.MMSR | Source code

MMSR(
self,
n_iter: int = 10000,
tol: float = 1e-10,
random_state: Optional[int] = 0,
observation_matrix: ... = _Nothing.NOTHING,
covariation_matrix: ... = _Nothing.NOTHING,
n_common_tasks: ... = _Nothing.NOTHING,
n_workers: int = 0,
n_tasks: int = 0,
n_labels: int = 0,
labels_mapping: Dict[Any, int] = _Nothing.NOTHING,
workers_mapping: Dict[Any, int] = _Nothing.NOTHING,
tasks_mapping: Dict[Any, int] = _Nothing.NOTHING
)

The Matrix Mean-Subsequence-Reduced Algorithm (M-MSR) model assumes that workers have different expertise levels and are represented as a vector of "skills" ss which entries sis_i show the probability that the worker ii will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:

E[MM1Cundefined1M111T]=ssT\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T,

where MM is the total number of classes, Cundefined\widetilde{C} is a covariance matrix between workers, and 11T\boldsymbol{1}\boldsymbol{1}^T is the all-ones matrix which has the same size as Cundefined\widetilde{C}.

Thus, the problem of estimating the skill level vector ss becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the robust rank-one matrix completion, so its result is an estimator of the vector ss.

And the aggregation is weighted majority voting with weights equal to log(M1)si1si\log \frac{(M-1)s_i}{1-s_i}.

Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.

34th Conference on Neural Information Processing Systems (NeurIPS 2020)

https://arxiv.org/abs/2010.12181

Parameters description

ParametersTypeDescription
n_iterint

The maximum number of iterations.

tolfloat

The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the tol parameter.

random_stateOptional[int]

The seed number for the random initialization.

_observation_matrix-

The matrix representing which workers give responses to which tasks.

_covariation_matrix-

The matrix representing the covariance between workers.

_n_common_tasks-

The matrix representing workers with tasks in common.

_n_workers-

The number of workers.

_n_tasks-

The number of tasks that are assigned to workers.

_n_labels-

The number of possible labels for a series of classification tasks.

_labels_mapping-

The mapping of labels and integer values.

_workers_mapping-

The mapping of workers and integer values.

_tasks_mapping-

The mapping of tasks and integer values.

labels_Optional[Series]

The task labels. The pandas.Series data is indexed by task so that labels.loc[task] is the most likely true label of tasks.

skills_Optional[Series]

The workers' skills. The pandas.Series data is indexed by worker and has the corresponding worker skill.

scores_Optional[DataFrame]

The task label scores. The pandas.DataFrame data is indexed by task so that result.loc[task, label] is a score of label for task.

loss_history_List[float]

A list of loss values during training.

Examples:

from crowdkit.aggregation import MMSR
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
mmsr = MMSR()
result = mmsr.fit_predict(df)

Methods summary

MethodDescription
fitFits the model to the training data.
fit_predictFits the model to the training data and returns the aggregated results.
fit_predict_scoreFits the model to the training data and returns the total sum of weights for each label.
predictPredicts the true labels of tasks when the model is fitted.
predict_scoreReturns the total sum of weights for each label when the model is fitted.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing