crowdkit.aggregation.classification.majority_vote.MajorityVote
| Source code
MajorityVote( self, on_missing_skill: str = 'error', default_skill: Optional[float] = None)
Majority Vote aggregation algorithm.
Majority vote is a straightforward approach for categorical aggregation: for each task, it outputs a label which has the largest number of responses. Additionaly, the majority vote can be used when different weights assigned for workers' votes. In this case, the resulting label will be the one with the largest sum of weights.
In case when two or more labels have the largest number of votes, the resulting label will be the same for all tasks which have the same set of labels with equal count of votes.
Parameters | Type | Description |
---|---|---|
default_skill | Optional[float] | Defualt worker's weight value. |
labels_ | Optional[Series] | Tasks' labels. A pandas.Series indexed by |
skills_ | Optional[Series] | workers' skills. A pandas.Series index by workers and holding corresponding worker's skill |
probas_ | Optional[DataFrame] | Tasks' label probability distributions. A pandas.DataFrame indexed by |
on_missing_skill | str | How to handle assignments done by workers with unknown skill. Possible values:
|
Examples:
Basic majority voting:
from crowdkit.aggregation import MajorityVotefrom crowdkit.datasets import load_datasetdf, gt = load_dataset('relevance-2')result = MajorityVote().fit_predict(df)
Weighted majority vote:
import pandas as pdfrom crowdkit.aggregation import MajorityVotedf = pd.DataFrame( [ ['t1', 'p1', 0], ['t1', 'p2', 0], ['t1', 'p3', 1], ['t2', 'p1', 1], ['t2', 'p2', 0], ['t2', 'p3', 1], ], columns=['task', 'worker', 'label'])skills = pd.Series({'p1': 0.5, 'p2': 0.7, 'p3': 0.4})result = MajorityVote.fit_predict(df, skills)
Method | Description |
---|---|
fit | Fit the model. |
fit_predict | Fit the model and return aggregated results. |
fit_predict_proba | Fit the model and return probability distributions on labels for each task. |