MajorityVote

crowdkit.aggregation.classification.majority_vote.MajorityVote | Source code

MajorityVote(
    self,
    on_missing_skill: str = 'error',
    default_skill: Optional[float] = None
)

The Majority Vote aggregation algorithm is a straightforward approach for categorical aggregation: for each task,

it outputs a label with the largest number of responses. Additionaly, the Majority Vote can be used when different weights are assigned to workers' votes. In this case, the resulting label will have the largest sum of weights.

Note

If two or more labels have the largest number of votes, the resulting label will be the same for all tasks that have the same set of labels with the same number of votes.

Parameters description

Parameters	Type	Description
`default_skill`	Optional[float]	Default worker weight value.
`labels_`	Optional[Series]	The task labels. The `pandas.Series` data is indexed by `task` so that `labels.loc[task]` is the most likely true label of tasks.
`skills_`	Optional[Series]	The workers' skills. The `pandas.Series` data is indexed by `worker` and has the corresponding worker skill.
`probas_`	Optional[DataFrame]	The probability distributions of task labels. The `pandas.DataFrame` data is indexed by `task` so that `result.loc[task, label]` is the probability that the `task` true label is equal to `label`. Each probability is in the range from 0 to 1, all task probabilities must sum up to 1.
`on_missing_skill`	str	A value which specifies how to handle assignments performed by workers with an unknown skill. Possible values: "error" — raises an exception if there is at least one assignment performed by a worker with an unknown skill; "ignore" — drops assignments performed by workers with an unknown skill during prediction. Raises an exception if there are no assignments with a known skill for any task; value — the default value will be used if a skill is missing.

Examples:

Basic Majority Vote:

from crowdkit.aggregation import MajorityVote
from crowdkit.datasets import load_dataset
df, gt = load_dataset('relevance-2')
result = MajorityVote().fit_predict(df)

Weighted Majority Vote:

import pandas as pd
from crowdkit.aggregation import MajorityVote
df = pd.DataFrame(
    [
        ['t1', 'p1', 0],
        ['t1', 'p2', 0],
        ['t1', 'p3', 1],
        ['t2', 'p1', 1],
        ['t2', 'p2', 0],
        ['t2', 'p3', 1],
    ],
    columns=['task', 'worker', 'label']
)
skills = pd.Series({'p1': 0.5, 'p2': 0.7, 'p3': 0.4})
result = MajorityVote.fit_predict(df, skills)

Methods summary

Method	Description
fit	Fits the model to the training data.
fit_predict	Fits the model to the training data and returns the aggregated results.
fit_predict_proba	Fits the model to the training data and returns probability distributions of labels for each task.

Last updated: March 31, 2023

Crowd-Kit

Overview

Reference

Aggregation

Datasets

Learning

Metrics

Postprocessing