crowdkit.aggregation.classification.m_msr.MMSR
| Source code
MMSR( self, n_iter: int = 10000, tol: float = 1e-10, random_state: Optional[int] = 0, observation_matrix: ... = _Nothing.NOTHING, covariation_matrix: ... = _Nothing.NOTHING, n_common_tasks: ... = _Nothing.NOTHING, n_workers: int = 0, n_tasks: int = 0, n_labels: int = 0, labels_mapping: Dict[Any, int] = _Nothing.NOTHING, workers_mapping: Dict[Any, int] = _Nothing.NOTHING, tasks_mapping: Dict[Any, int] = _Nothing.NOTHING)
The Matrix Mean-Subsequence-Reduced Algorithm (M-MSR) model assumes that workers have different expertise levels and are represented as a vector of "skills" which entries show the probability that the worker will answer the given task correctly. Having that, we can estimate the probability of each worker via solving a rank-one matrix completion problem as follows:
,
where is the total number of classes, is a covariance matrix between workers, and is the all-ones matrix which has the same size as .
Thus, the problem of estimating the skill level vector becomes equivalent to the rank-one matrix completion problem. The M-MSR algorithm is an iterative algorithm for the robust rank-one matrix completion, so its result is an estimator of the vector .
And the aggregation is weighted majority voting with weights equal to .
Q. Ma and Alex Olshevsky. Adversarial Crowdsourcing Through Robust Rank-One Matrix Completion.
34th Conference on Neural Information Processing Systems (NeurIPS 2020)
https://arxiv.org/abs/2010.12181
Parameters | Type | Description |
---|---|---|
n_iter | int | The maximum number of iterations. |
tol | float | The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the |
random_state | Optional[int] | The seed number for the random initialization. |
_observation_matrix | - | The matrix representing which workers give responses to which tasks. |
_covariation_matrix | - | The matrix representing the covariance between workers. |
_n_common_tasks | - | The matrix representing workers with tasks in common. |
_n_workers | - | The number of workers. |
_n_tasks | - | The number of tasks that are assigned to workers. |
_n_labels | - | The number of possible labels for a series of classification tasks. |
_labels_mapping | - | The mapping of labels and integer values. |
_workers_mapping | - | The mapping of workers and integer values. |
_tasks_mapping | - | The mapping of tasks and integer values. |
labels_ | Optional[Series] | The task labels. The |
skills_ | Optional[Series] | The workers' skills. The |
scores_ | Optional[DataFrame] | The task label scores. The |
loss_history_ | List[float] | A list of loss values during training. |
Examples:
from crowdkit.aggregation import MMSRfrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')mmsr = MMSR()result = mmsr.fit_predict(df)
Method | Description |
---|---|
fit | Fits the model to the training data. |
fit_predict | Fits the model to the training data and returns the aggregated results. |
fit_predict_score | Fits the model to the training data and returns the total sum of weights for each label. |
predict | Predicts the true labels of tasks when the model is fitted. |
predict_score | Returns the total sum of weights for each label when the model is fitted. |
Last updated: March 31, 2023