crowdkit.aggregation.embeddings.hrrasa.HRRASA | Source code
HRRASA( self, n_iter: int = 100, tol: float = 1e-09, lambda_emb: float = 0.5, lambda_out: float = 0.5, alpha: float = 0.05, calculate_ranks: bool = False, output_similarity=glue_similarity )
Hybrid Reliability and Representation Aware Sequence Aggregation.
At the first step, the HRRASA estimates local workers reliabilities that represent how good is a worker's answer to one particular task. The local reliability of the worker on the task is denoted by and is calculated as a sum of two terms:
The is a reliability calculated on
embedding and the is a
reliability calculated on
The is calculated by the following equation:
where is a set of workers' responses on task . The makes use
of some similarity measure on the
output data, e.g. GLUE similarity on texts:
The HRRASA also estimates global workers' reliabilities that are initialized by ones.
Next, the algorithm iteratively performs two steps:
- For each task, estimate the aggregated embedding:
- For each worker, estimate the global reliability: , where is a set of tasks completed by the worker
Finally, the aggregated result is the output which embedding is
the closest one to the . If
calculate_ranks is true, the method also calculates ranks for
each workers' response as
Jiyi Li. Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20), July 25–30, 2020, Virtual Event, China. ACM, New York, NY, USA,
A number of iterations.
A weight of reliability calculated on embeddigs.
A weight of reliability calculated on outputs.
Confidence level of chi-squared distribution quantiles in beta parameter formula.
If true, calculate additional attribute
import numpy as np import pandas as pd from crowdkit.aggregation import HRRASA df = pd.DataFrame( [ ['t1', 'p1', 'a', np.array([1.0, 0.0])], ['t1', 'p2', 'a', np.array([1.0, 0.0])], ['t1', 'p3', 'b', np.array([0.0, 1.0])] ], columns=['task', 'worker', 'output', 'embedding'] ) result = HRRASA().fit_predict(df)
|fit||Fit the model.|
|fit_predict||Fit the model and return aggregated outputs.|
|fit_predict_scores||Fit the model and return scores.|