`crowdkit.aggregation.classification.m_msr.MMSR`

| Source code

`MMSR( self, n_iter: int = 10000, tol: float = 1e-10, random_state: Optional[int] = 0, observation_matrix: ... = ..., covariation_matrix: ... = ..., n_common_tasks: ... = ..., n_workers: int = 0, n_tasks: int = 0, n_labels: int = 0, labels_mapping: Dict[Any, int] = ..., workers_mapping: Dict[Any, int] = ..., tasks_mapping: Dict[Any, int] = ...)`

Matrix Mean-Subsequence-Reduced Algorithm.

The M-MSR assumes that workers have different level of expertise and associated with a vector of "skills" $\boldsymbol{s}$ which entries $s_i$ show the probability of the worker $i$ to answer correctly to the given task. Having that, we can show that

$\mathbb{E}\left[\frac{M}{M-1}\widetilde{C}-\frac{1}{M-1}\boldsymbol{1}\boldsymbol{1}^T\right] = \boldsymbol{s}\boldsymbol{s}^T$,where $M$ is the total number of classes, $\widetilde{C}$ is a covariation matrix between workers, and $\boldsymbol{1}\boldsymbol{1}^T$ is the all-ones matrix which has the same size as $\widetilde{C}$.

So, the problem of recovering the skills vector $\boldsymbol{s}$ becomes equivalent to the
rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for *rubust*
rank-one matrix completion, so its result is an estimator of the vector $\boldsymbol{s}$.

Then, the aggregation is the weighted majority vote with weights equal to $\log \frac{(M-1)s_i}{1-s_i}$.

Matrix Mean-Subsequence-Reduced Algorithm. Qianqian Ma and Alex Olshevsky.
Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.
*34th Conference on Neural Information Processing Systems (NeurIPS 2020)*

https://arxiv.org/abs/2010.12181

Parameters | Type | Description |
---|---|---|

`n_iter` | int | The maximum number of iterations of the M-MSR algorithm. |

`eps` | - | Convergence threshold. |

`random_state` | Optional[int] | Seed number for the random initialization. |

`labels_` | Optional[Series] | Tasks' labels. A pandas.Series indexed by |

`skills_` | Optional[Series] | workers' skills. A pandas.Series index by workers and holding corresponding worker's skill |

`scores_` | Optional[DataFrame] | Tasks' label scores. A pandas.DataFrame indexed by |

**Examples:**

`from crowdkit.aggregation import MMSRfrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')mmsr = MMSR()result = mmsr.fit_predict(df)`

Method | Description |
---|---|

fit | Estimate the workers' skills. |

fit_predict | Fit the model and return aggregated results. |

fit_predict_score | Fit the model and return the total sum of weights for each label. |

predict | Infer the true labels when the model is fitted. |

predict_score | Return total sum of weights for each label when the model is fitted. |