ClosestToAverage

crowdkit.aggregation.embeddings.closest_to_average.ClosestToAverage | Source code

ClosestToAverage(self, distance: Callable[[..., ...], float])

Closest to Average - chooses the output with the embedding closest to the average embedding.

This method takes a DataFrame containing four columns: task, worker, output, and embedding. Here the embedding is a vector containing a representation of the output. The output might be any type of data such as text, images, NumPy arrays, etc. As the result, the method returns the output which embedding is the closest one to the average embedding of the task's responses.

Parameters Description

ParametersTypeDescription
distanceCallable[[..., ...], float]

A callable that takes two NumPy arrays and returns a single float number — the distance between these two vectors.

embeddings_and_outputs_DataFrame

Tasks' embeddings and outputs. A pandas.DataFrame indexed by task with embedding and output columns.

scores_DataFrame

Tasks' label scores. A pandas.DataFrame indexed by task such that result.loc[task, label] is the score of label for task.

Methods Summary

MethodDescription
fitFits the model.
fit_predictFit the model and return the aggregated results.
fit_predict_scoresFit the model and return the estimated scores.
Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing