crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote
| Source code
GoldMajorityVote(self)
Majority Vote when exist golden dataset (ground truth) for some tasks.
Calculates the probability of a correct label for each worker based on the golden set. Based on this, for each task, calculates the sum of the probabilities of each label. The correct label is the one where the sum of the probabilities is greater.
For Example: You have 10k tasks completed by 3k different workers. And you have 100 tasks where you already
know ground truth labels. First you can call fit
to calc percents of correct labels for each workers.
And then call predict
to calculate labels for you 10k tasks.
It's necessary that:
predict
, existed in answers dataset that was sent to fit
.Parameters | Type | Description |
---|---|---|
labels_ | Optional[Series] | Tasks' labels. A pandas.Series indexed by |
skills_ | Series | workers' skills. A pandas.Series index by workers and holding corresponding worker's skill |
probas_ | DataFrame | Tasks' label probability distributions. A pandas.DataFrame indexed by |
Examples:
import pandas as pdfrom crowdkit.aggregation import GoldMajorityVotedf = pd.DataFrame( [ ['t1', 'p1', 0], ['t1', 'p2', 0], ['t1', 'p3', 1], ['t2', 'p1', 1], ['t2', 'p2', 0], ['t2', 'p3', 1], ], columns=['task', 'worker', 'label'])true_labels = pd.Series({'t1': 0})gold_mv = GoldMajorityVote()result = gold_mv.fit_predict(df, true_labels)
Method | Description |
---|---|
fit | Estimate the workers' skills. |
fit_predict | Fit the model and return aggregated results. |
fit_predict_proba | Fit the model and return probability distributions on labels for each task. |
predict | Infer the true labels when the model is fitted. |
predict_proba | Return probability distributions on labels for each task when the model is fitted. |