crowdkit.aggregation.embeddings.rasa.RASA
| Source code
RASA( self, n_iter: int = 100, tol: float = 1e-09, alpha: float = 0.05)
Reliability Aware Sequence Aggregation.
RASA estimates global workers' reliabilities that are initialized by ones.
Next, the algorithm iteratively performs two steps:
For each task, estimate the aggregated embedding: .
For each worker, estimate the global reliability: , where is a set of tasks completed by the worker .
Finally, the aggregated result is the output which embedding is the closest one to the .
Jiyi Li. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, pages 24–28 Hong Kong, China, November 3, 2019. https://doi.org/10.18653/v1/D19-5904
Parameters | Type | Description |
---|---|---|
n_iter | int | A number of iterations. |
alpha | float | Confidence level of chi-squared distribution quantiles in beta parameter formula. |
embeddings_and_outputs_ | DataFrame | Tasks' embeddings and outputs. A pandas.DataFrame indexed by |
Examples:
import numpy as npimport pandas as pdfrom crowdkit.aggregation import RASAdf = pd.DataFrame( [ ['t1', 'p1', 'a', np.array([1.0, 0.0])], ['t1', 'p2', 'a', np.array([1.0, 0.0])], ['t1', 'p3', 'b', np.array([0.0, 1.0])] ], columns=['task', 'worker', 'output', 'embedding'])result = RASA().fit_predict(df)
Method | Description |
---|---|
fit | Fit the model. |
fit_predict | Fit the model and return aggregated outputs. |
fit_predict_scores | Fit the model and return scores. |