Toloka documentation


crowdkit.aggregation.embeddings.closest_to_average.ClosestToAverage | Source code

ClosestToAverage(self, distance: Callable[[..., ...], float])

Closest to Average - chooses the output with the embedding closest to the average embedding.

This method takes a DataFrame containing four columns: task, worker, output, and embedding. Here the embedding is a vector containing a representation of the output. The output might be any type of data such as text, images, NumPy arrays, etc. As the result, the method returns the output which embedding is the closest one to the average embedding of the task's responses.

Parameters Description

Parameters Type Description
distance Callable[[..., ...], float]

A callable that takes two NumPy arrays and returns a single float number — the distance between these two vectors.

embeddings_and_outputs_ DataFrame

Tasks' embeddings and outputs. A pandas.DataFrame indexed by task with embedding and output columns.

scores_ DataFrame

Tasks' label scores. A pandas.DataFrame indexed by task such that result.loc[task, label] is the score of label for task.

Methods Summary

Method Description
fit Fits the model.
fit_predict Fit the model and return the aggregated results.
fit_predict_scores Fit the model and return the estimated scores.