TextRASA

crowdkit.aggregation.texts.text_rasa.TextRASA | Source code

TextRASA(
self,
encoder: Callable[[str], ...],
n_iter: int = 100,
tol: float = 1e-05,
alpha: float = 0.05
)

RASA on text embeddings.

Given a sentence encoder, encodes texts provided by workers and runs the RASA algorithm for embedding aggregation.

Parameters description

ParametersTypeDescription
encoderCallable[[str], ...]

A callable that takes a text and returns a NumPy array containing the corresponding embedding.

n_iterint

A number of RASA iterations.

alphafloat

Confidence level of chi-squared distribution quantiles in beta parameter formula.

Examples:

We suggest to use sentence encoders provided by Sentence Transformers.

from crowdkit.datasets import load_dataset
from crowdkit.aggregation import TextRASA
from sentence_transformers import SentenceTransformer
encoder = SentenceTransformer('all-mpnet-base-v2')
hrrasa = TextRASA(encoder=encoder.encode)
df, gt = load_dataset('crowdspeech-test-clean')
df['text'] = df['text'].str.lower()
result = hrrasa.fit_predict(df)

Methods summary

MethodDescription
fitFit the model.
fit_predictFit the model and return aggregated texts.
fit_predict_scoresFit the model and return scores.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing