crowdkit.aggregation.embeddings.rasa.RASA
| Source code
RASA( self, n_iter: int = 100, tol: float = 1e-09, alpha: float = 0.05)
The Reliability Aware Sequence Aggregation (RASA) algorithm consists of three steps.
Step 1. Encode the worker answers into embeddings.
Step 2. Estimate the global workers' reliabilities by iteratively performing two steps:
For each task, estimate the aggregated embedding:
For each worker, estimate the global reliability:
, where is a set of tasks completed by the worker .
Step 3. Estimate the aggregated result. It is the output which embedding is the closest one to .
Jiyi Li, Fumiyo Fukumoto. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation.
In Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, Hong Kong, China (November 3, 2019), 24–28.
https://doi.org/10.18653/v1/D19-5904
Parameters | Type | Description |
---|---|---|
n_iter | int | The maximum number of iterations. |
tol | float | The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the |
alpha | float | The significance level of the chi-squared distribution quantiles in the parameter formula. |
embeddings_and_outputs_ | DataFrame | The task embeddings and outputs. The |
loss_history_ | List[float] | A list of loss values during training. |
Examples:
import numpy as npimport pandas as pdfrom crowdkit.aggregation import RASAdf = pd.DataFrame( [ ['t1', 'p1', 'a', np.array([1.0, 0.0])], ['t1', 'p2', 'a', np.array([1.0, 0.0])], ['t1', 'p3', 'b', np.array([0.0, 1.0])] ], columns=['task', 'worker', 'output', 'embedding'])result = RASA().fit_predict(df)
Method | Description |
---|---|
fit | Fits the model to the training data. |
fit_predict | Fits the model to the training data and returns the aggregated outputs. |
fit_predict_scores | Fits the model to the training data and returns the estimated scores. |
Last updated: March 31, 2023