crowdkit.aggregation.embeddings.hrrasa.HRRASA
| Source code
HRRASA( self, n_iter: int = 100, tol: float = 1e-09, lambda_emb: float = 0.5, lambda_out: float = 0.5, alpha: float = 0.05, calculate_ranks: bool = False, output_similarity: Callable[[str, List[List[str]]], float] = glue_similarity)
The Hybrid Reliability and Representation Aware Sequence Aggregation (HRRASA) algorithm consists of four steps.
Step 1. Encode the worker answers into embeddings.
Step 2. Estimate the local workers' reliabilities that represent how well a worker responds to one particular task. The local reliability of the worker on the task is denoted by and is calculated by incorporating both types of representations:
,
where the value is a reliability calculated on embedding
, and the value is a
reliability calculated on output
.
The value is calculated by the following equation:
,
where is a set of workers' responses on task .
The value uses some similarity measure on the output
data, e.g. GLEU similarity on texts:
.
Step 3. Estimate the global workers' reliabilities by iteratively performing two steps:
For each task, estimate the aggregated embedding:
.
For each worker, estimate the global reliability:
, where is a set of tasks completed by the worker .
Step 4. Estimate the aggregated result. It is the output which embedding is the closest one to . If calculate_ranks
is true, the method also calculates ranks for each worker response as
.
Jiyi Li. Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation.
In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20), China (July 25–30, 2020), 1761-1764.
https://doi.org/10.1145/3397271.3401239
Parameters | Type | Description |
---|---|---|
n_iter | int | The maximum number of iterations. |
tol | float | The tolerance stopping criterion for iterative methods with a variable number of steps. The algorithm converges when the loss change is less than the |
lambda_emb | float | The weight of reliability calculated on embeddings. |
lambda_out | float | The weight of reliability calculated on outputs. |
alpha | float | The significance level of the chi-squared distribution quantiles in the parameter formula. |
calculate_ranks | bool | Specifies if the additional |
_output_similarity | - | The similarity measure of the |
embeddings_and_outputs_ | - | The task embeddings and outputs. The |
loss_history_ | List[float] | A list of loss values during training. |
Examples:
import numpy as npimport pandas as pdfrom crowdkit.aggregation import HRRASAdf = pd.DataFrame( [ ['t1', 'p1', 'a', np.array([1.0, 0.0])], ['t1', 'p2', 'a', np.array([1.0, 0.0])], ['t1', 'p3', 'b', np.array([0.0, 1.0])] ], columns=['task', 'worker', 'output', 'embedding'])result = HRRASA().fit_predict(df)
Method | Description |
---|---|
fit | Fits the model to the training data. |
fit_predict | Fits the model to the training data and returns the aggregated outputs. |
fit_predict_scores | Fits the model to the training data and returns the estimated scores. |
Last updated: March 31, 2023