TextRASA
crowdkit.aggregation.texts.text_rasa.TextRASA
| Source code
TextRASA(
self,
encoder: Callable,
n_iter: int = 100,
tol: float = 1e-05,
alpha: float = 0.05
)
RASA on text embeddings.
Given a sentence encoder, encodes texts provided by workers and runs the RASA algorithm for embedding aggregation.
Parameters Description
Parameters | Type | Description |
---|---|---|
encoder |
Callable | A callable that takes a text and returns a NumPy array containing the corresponding embedding. |
n_iter |
int | A number of RASA iterations. |
alpha |
float | Confidence level of chi-squared distribution quantiles in beta parameter formula. |
Examples:
We suggest to use sentence encoders provided by Sentence Transformers.
from crowdkit.datasets import load_dataset
from crowdkit.aggregation import TextRASA
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-mpnet-base-v2')
hrrasa = TextRASA(encoder=encoder.encode)
df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].apply(lambda s: s.lower())
result = hrrasa.fit_predict(df)
Methods Summary
Method | Description |
---|---|
fit | Fit the model. |
fit_predict | Fit the model and return aggregated texts. |
fit_predict_scores | Fit the model and return scores. |