# RASA

crowdkit.aggregation.embeddings.rasa.RASA | Source code

RASA(    self,    n_iter: int = 100,    tol: float = 1e-09,    alpha: float = 0.05)

Reliability Aware Sequence Aggregation.

RASA estimates global workers' reliabilities $\beta$ that are initialized by ones.

Next, the algorithm iteratively performs two steps:

1. For each task, estimate the aggregated embedding: $\hat{e}_i = \frac{\sum_k \beta_k e_i^k}{\sum_k \beta_k}$.

2. For each worker, estimate the global reliability: $\beta_k = \frac{\chi^2_{(\alpha/2, |\mathcal{V}_k|)}}{\sum_i\left(\|e_i^k - \hat{e}_i\|^2\right)}$, where $\mathcal{V}_k$ is a set of tasks completed by the worker $k$.

Finally, the aggregated result is the output which embedding is the closest one to the $\hat{e}_i$.

Jiyi Li. A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation. Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, pages 24–28 Hong Kong, China, November 3, 2019. https://doi.org/10.18653/v1/D19-5904

## Parameters Description

ParametersTypeDescription
n_iterint

A number of iterations.

alphafloat

Confidence level of chi-squared distribution quantiles in beta parameter formula.

embeddings_and_outputs_DataFrame

Tasks' embeddings and outputs. A pandas.DataFrame indexed by task with embedding and output columns.

Examples:

import numpy as npimport pandas as pdfrom crowdkit.aggregation import RASAdf = pd.DataFrame(    [        ['t1', 'p1', 'a', np.array([1.0, 0.0])],        ['t1', 'p2', 'a', np.array([1.0, 0.0])],        ['t1', 'p3', 'b', np.array([0.0, 1.0])]    ],    columns=['task', 'worker', 'output', 'embedding'])result = RASA().fit_predict(df)

## Methods Summary

MethodDescription
fitFit the model.
fit_predictFit the model and return aggregated outputs.
fit_predict_scoresFit the model and return scores.
Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing