`crowdkit.aggregation.classification.m_msr.MMSR`

| Source code

`MMSR( self, n_iter: int = 10000, tol: float = 1e-10, random_state: Optional[int] = 0, observation_matrix: ... = _Nothing.NOTHING, covariation_matrix: ... = _Nothing.NOTHING, n_common_tasks: ... = _Nothing.NOTHING, n_workers: int = 0, n_tasks: int = 0, n_labels: int = 0, labels_mapping: Dict[Any, int] = _Nothing.NOTHING, workers_mapping: Dict[Any, int] = _Nothing.NOTHING, tasks_mapping: Dict[Any, int] = _Nothing.NOTHING)`

The **Matrix Mean-Subsequence-Reduced Algorithm** (M-MSR) model assumes that workers have different expertise levels and are represented
as a vector of "skills" $s$ which entries $s_i$ show the probability
that the worker $i$ will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:

$\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T$,

where $M$ is the total number of classes, $\widetilde{C}$ is a covariance matrix between workers, and $\boldsymbol{1}\boldsymbol{1}^T$ is the all-ones matrix which has the same size as $\widetilde{C}$.

Thus, the problem of estimating the skill level vector $s$ becomes equivalent to the
rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the *robust*
rank-one matrix completion, so its result is an estimator of the vector $s$.

And the aggregation is weighted majority voting with weights equal to $\log \frac{(M-1)s_i}{1-s_i}$.

Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.

*34th Conference on Neural Information Processing Systems (NeurIPS 2020)*

https://arxiv.org/abs/2010.12181

Parameters | Type | Description |
---|---|---|

`n_iter` | int | The maximum number of iterations. |

`tol` | float | The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the |

`random_state` | Optional[int] | The seed number for the random initialization. |

`_observation_matrix` | - | The matrix representing which workers give responses to which tasks. |

`_covariation_matrix` | - | The matrix representing the covariance between workers. |

`_n_common_tasks` | - | The matrix representing workers with tasks in common. |

`_n_workers` | - | The number of workers. |

`_n_tasks` | - | The number of tasks that are assigned to workers. |

`_n_labels` | - | The number of possible labels for a series of classification tasks. |

`_labels_mapping` | - | The mapping of labels and integer values. |

`_workers_mapping` | - | The mapping of workers and integer values. |

`_tasks_mapping` | - | The mapping of tasks and integer values. |

`labels_` | Optional[Series] | The task labels. The |

`skills_` | Optional[Series] | The workers' skills. The |

`scores_` | Optional[DataFrame] | The task label scores. The |

`loss_history_` | List[float] | A list of loss values during training. |

**Examples:**

`from crowdkit.aggregation import MMSRfrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')mmsr = MMSR()result = mmsr.fit_predict(df)`

Method | Description |
---|---|

fit | Fits the model to the training data. |

fit_predict | Fits the model to the training data and returns the aggregated results. |

fit_predict_score | Fits the model to the training data and returns the total sum of weights for each label. |

predict | Predicts the true labels of tasks when the model is fitted. |

predict_score | Returns the total sum of weights for each label when the model is fitted. |

**Last updated:** March 31, 2023