crowdkit.aggregation.texts.text_hrrasa.TextHRRASA
| Source code
TextHRRASA( self, encoder: Callable[[str], Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]], n_iter: int = 100, tol: float = 1e-05, lambda_emb: float = 0.5, lambda_out: float = 0.5, alpha: float = 0.05, calculate_ranks: bool = False, output_similarity: Callable[[str, List[List[str]]], float] = glue_similarity)
HRRASA on text embeddings.
Given a sentence encoder, encodes texts provided by workers and runs the HRRASA algorithm for embedding aggregation.
Parameters | Type | Description |
---|---|---|
encoder | Callable[[str], Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] | A callable that takes a text and returns a NumPy array containing the corresponding embedding. |
n_iter | int | A number of HRRASA iterations. |
lambda_emb | float | A weight of reliability calculated on embeddigs. |
lambda_out | float | A weight of reliability calculated on outputs. |
alpha | float | Confidence level of chi-squared distribution quantiles in beta parameter formula. |
calculate_ranks | bool | If true, calculate additional attribute |
Examples:
We suggest to use sentence encoders provided by Sentence Transformers.
from crowdkit.datasets import load_datasetfrom crowdkit.aggregation import TextHRRASAfrom sentence_transformers import SentenceTransformerencoder = SentenceTransformer('all-mpnet-base-v2')hrrasa = TextHRRASA(encoder=encoder.encode)df, gt = load_dataset('crowdspeech-test-clean')df['text'] = df['text'].str.lower()result = hrrasa.fit_predict(df)
Method | Description |
---|---|
fit_predict | Fit the model and return aggregated texts. |
fit_predict_scores | Fit the model and return scores. |