crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote
| Source code
GoldMajorityVote(self)
The Gold Majority Vote model is used when a golden dataset (ground truth) exists for some tasks.
It calculates the probability of a correct label for each worker based on the golden set. After that, the sum of the probabilities of each label is calculated for each task. The correct label is the one with the greatest sum of the probabilities.
For example, you have 10 000 tasks completed by 3 000 different workers. And you have 100 tasks where you already
know the ground truth labels. First, you can call fit
to calculate the percentage of correct labels for each worker.
And then call predict
to calculate labels for your 10 000 tasks.
The following rules must be observed:
predict
must be included in the response dataset that is submitted to fit
.Parameters | Type | Description |
---|---|---|
labels_ | Optional[Series] | The task labels. The |
skills_ | Optional[Series] | The workers' skills. The |
probas_ | Optional[DataFrame] | The probability distributions of task labels. The |
Examples:
import pandas as pdfrom crowdkit.aggregation import GoldMajorityVotedf = pd.DataFrame( [ ['t1', 'p1', 0], ['t1', 'p2', 0], ['t1', 'p3', 1], ['t2', 'p1', 1], ['t2', 'p2', 0], ['t2', 'p3', 1], ], columns=['task', 'worker', 'label'])true_labels = pd.Series({'t1': 0})gold_mv = GoldMajorityVote()result = gold_mv.fit_predict(df, true_labels)
Method | Description |
---|---|
fit | Fits the model to the training data. |
fit_predict | Fits the model to the training data and returns the aggregated results. |
fit_predict_proba | Fits the model to the training data and returns probability distributions of labels for each task. |
predict | Predicts the true labels of tasks when the model is fitted. |
predict_proba | Returns probability distributions of labels for each task when the model is fitted. |
Last updated: March 31, 2023