RASA
crowdkit.aggregation.embeddings.rasa.RASA
| Source code
RASA(
self,
n_iter: int = 100,
tol: float = 1e-09,
alpha: float = 0.05
)
Reliability Aware Sequence Aggregation.
RASA estimates global workers' reliabilities that are initialized by ones.
Next, the algorithm iteratively performs two steps:
- For each task, estimate the aggregated embedding:
- For each worker, estimate the global reliability: , where is a set of tasks completed by the worker
Finally, the aggregated result is the output which embedding is the closest one to the .
Jiyi Li. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, pages 24–28 Hong Kong, China, November 3, 2019. https://doi.org/10.18653/v1/D19-5904
Parameters Description
Parameters | Type | Description |
---|---|---|
n_iter |
int | A number of iterations. |
alpha |
float | Confidence level of chi-squared distribution quantiles in beta parameter formula. |
embeddings_and_outputs_ |
DataFrame | Tasks' embeddings and outputs. A pandas.DataFrame indexed by |
Examples:
import numpy as np
import pandas as pd
from crowdkit.aggregation import RASA
df = pd.DataFrame(
[
['t1', 'p1', 'a', np.array([1.0, 0.0])],
['t1', 'p2', 'a', np.array([1.0, 0.0])],
['t1', 'p3', 'b', np.array([0.0, 1.0])]
],
columns=['task', 'worker', 'output', 'embedding']
)
result = RASA().fit_predict(df)
Methods Summary
Method | Description |
---|---|
fit | Fit the model. |
fit_predict | Fit the model and return aggregated outputs. |
fit_predict_scores | Fit the model and return scores. |