ClosestToAverage

crowdkit.aggregation.embeddings.closest_to_average.ClosestToAverage | Source code

ClosestToAverage(self, distance: Callable[[..., ...], float])

The Closest to Average aggregation model chooses the output with the embedding that's closest to the average embedding.

This method takes a DataFrame containing four columns: task, worker, output, and embedding. Here the embedding is a vector containing a representation of the output which might be any type of data such as text, images, NumPy arrays, etc. As a result, the method returns the output which embedding is the closest one to the average embedding of the task responses.

Parameters description

ParametersTypeDescription
distanceCallable[[..., ...], float]

A callable that takes two NumPy arrays (the task embedding and the aggregated embedding) and returns a single float number — the distance between these two vectors.

embeddings_and_outputs_DataFrame

The task embeddings and outputs. The pandas.DataFrame data is indexed by task and has the embedding and output columns.

scores_DataFrame

The task label scores. The pandas.DataFrame data is indexed by task so that result.loc[task, label] is a score of label for task.

Methods summary

MethodDescription
fitFits the model to the training data.
fit_predictFits the model to the training data and returns the aggregated outputs.
fit_predict_scoresFits the model to the training data and returns the estimated scores.

Last updated: March 31, 2023

Crowd-Kit
Overview
Reference
Aggregation
Datasets
Learning
Metrics
Postprocessing